tags:

views:

108

answers:

5

Hi,

I tried the following code snippet from Robert's Perl tutorial (link text):

> $_='My email address is
> <[email protected]>.';
> 
> print "Found it ! :$1:" if /(<*>)/i;

When I ran it, the output was:

Found it ! :>:

However, shouldn't the output be,

Found it ! :m>:

since 'm' matches "0 or more '<' i.e the '<*' part of the regex"

Also,

$_='My email address is <[email protected]>.';
print "Match 1 worked :$1:" if /(<*)/i;

When this is run the output is:

Match 1 worked ::

$_='<My email address is <[email protected]>.';
print "Match 2 worked :$1:" if /(<*)/i;

When the above is run, the output is:

Match 2 worked :<:

But shouldn't the output be:

Match 2 worked ::

since the first match (i.e. $1) is "" rather than "<", like the example before it.

+3  A: 
if /(<*>)/i;

will match 0 or more < chars, followed immediately by a > char...

so the only possible match is the > char which is preceeded by 0 < chars.

jspcal
+2  A: 

With $1 you access the first "capture" of the regex, with a capture being what's put between brackets. In your example I think you're missing a . <*> matches zero or more '<' characters followed by a '>' character, so here it matches zero '<' and one '>'. It probably should read like this:

print "Found it ! :$1:" if /(<.*>)/i;

Now this matches a '<' followed by zero or more arbitrary characters ('.' matches any character), followed by '>'.

ahans
I shouldn't have been too quick with marking this up. The tutorial has both `<.*>` and `<*>`. The answer is that `<*` does not match anything so just the `>` matches and that is all that is returned.
Kevin Brock
That is exactly what I've said and what's also said by jspcal. I don't know what's in the tutorial though. Also, I just reread the question and realize that the author expected the first regex not to match the complete email string but was surprised about the result. While most answers (including mine) explain the result of that query, they leave out the second question except nil's answer which I believe is the best so far.
ahans
+1  A: 

Regular expressions in Perl work a bit differently than wildcards in many OS applications.

The * means "0 or more of the previous thing". So when you do

<*>

IT means

"Zero or more less than characters, followed by a greater than character."

What you want is the regular expression user's best friend: .

<.*>

That means

"a less than character, followed by ANYTHING 0 or more times, followed by a greater than character."

But that's probably not what you mean either: the > character is also "any character"! Fortunately, there's an easy way of saying what you really mean you make * no longer greedy with the ? character:

<.*?>

This means, "The less than character, followed by anything, 0 or more times, UNTIL I reach a > character."

Woo!

There's a few great websites out there that will get you familiar with the great world of regexes, and one of my favorite is regular expressions.info. For perl specific regexes, though, you can't beat the classic Perl Regular Expressions Tutorial. The perl regular expressions tutorial has guided many a regex wanderers to the Perl homeland, and is a great resource.

Robert P
The tutorial has both `<.*>` and `<*>` and discusses the difference between those two.
Kevin Brock
+3  A: 

The answer to your first question is simple, you're wrong.

The second question is rather interesting, to understand this you need to know two facts:

  1. Once there's a successful match, the regular expression will stop matching and return the result it believes successful.
  2. The standard quantifiers (* + ? and {min, max}) are greedy. which means, /<*/ will match as much <<<<<... as possible.

So, back to the regex /<*/. When matching

My email address is <[email protected]>.

The very beginning of the string, ^, matches the regex, which results an empty string. This is a successful match, and the next step, ^M, does not match your regex. so voila, perl will stop matching and give your the empty result.

Then come to second string

<My email address is <[email protected]>.

The very beginning of the string, ^, matches the regex, which results an empty string. But, the next step, ^<, still matches your regex. and quntifier * is greedy. It will match as much as possible. So results in an <.

nil
A: 

Personally I'm very fond of the cheat sheet at Added Bytes.

Anders