ansaurus

Question

Question about specific regular expression

Answer 1

+1 A:

I don't think it has any purpose. But because RegEx is almost impossible to understand/decompose, people rarely point out errors. That is probably why no one else pointed it out.

Edit: Why am I downvoted for not being wrong?

Marius 2008-08-17 02:18:06

Answer 2

+3 A:

@Rob: I disagree. To enforce what you are asking for I think you would need to use negative-look-behind, which is possible but is certainly not related to use {1}. Neither version of the regexp address that particular issue.

To let the code speak:

tibook 0 /home/jj33/swap > cat text
Text this is <http://example.com&gt; text this is
Text this is <http://http://example.com&gt; text this is
tibook 0 /home/jj33/swap > cat p
#!/usr/bin/perl

my $re1 = '((mailto\:|(news|(ht|f)tp(s?))\://){1}\S+)';
my $re2 = '((mailto\:|(news|(ht|f)tp(s?))\://)\S+)';

while (<>) {
  print "Evaluating: $_";
  print "re1 saw \$1 = $1\n" if (/$re1/);
  print "re2 saw \$1 = $1\n" if (/$re2/);
}
tibook 0 /home/jj33/swap > cat text | perl p
Evaluating: Text this is <http://example.com&gt; text this is
re1 saw $1 = <http://example.com&gt;
re2 saw $1 = <http://example.com&gt;
Evaluating: Text this is <http://http://example.com&gt; text this is
re1 saw $1 = <http://http://example.com&gt;
re2 saw $1 = <http://http://example.com&gt;
tibook 0 /home/jj33/swap >

So, if there is a difference between the two versions, it's doesn't seem to be the one you suggest.

jj33 2008-08-17 02:46:42

Answer 3

+1 A:

@Jeff Atwood, your interpretation is a little off - the {1} means match exactly once, but has no effect on the "capturing" - the capturing occurs because of the parens - the braces only specify the number of times the pattern must match the source - once, as you say.

I agree with @Marius, even if his answer is a little terse and may come off as being flippant. Regular expressions are tough, if one's not used to using them, and the {1} in the question isn't quite error - in systems that support it, it does mean "exactly one match". In this sense, it doesn't really do anything.

Unfortunately, contrary to a now-deleted post, it doesn't keep the regexp from matching http://http://example.org, since the \S+ at the end will match one or more non-whitespace characters, including the http://example.org in http://http://example.org (verified using Python 2.5, just in case my regexp reading was off). So, the regexp given isn't really the best. I'm not a URL expert, but probably something limiting the appearance of ":"s and "//"s after the first one would be necessary (but hardly sufficient) to ensure good URLs.

Blair Conrad 2008-08-17 02:56:56

Answer 4

+1 A:

I don't think the {1} has any valid function in that regex.

(mailto\:|(news|(ht|f)tp(s?))\://){1}

You should read this as: "capture the stuff in the parens exactly one time". But we don't really care about capturing this for use later, eg $1 in the replacement. So it's pointless.

Jeff Atwood 2008-08-17 02:58:18

ansaurus

tags:

views:

answers:

Question about specific regular expression

related questions