ansaurus

Question

Why doesn't my Perl regex match what I think it should?

Answer 1

+3 A:

if /(<*>)/i;

will match 0 or more < chars, followed immediately by a > char...

so the only possible match is the > char which is preceeded by 0 < chars.

jspcal 2010-01-18 06:46:10

Answer 2

+2 A:

With $1 you access the first "capture" of the regex, with a capture being what's put between brackets. In your example I think you're missing a . <*> matches zero or more '<' characters followed by a '>' character, so here it matches zero '<' and one '>'. It probably should read like this:

print "Found it ! :$1:" if /(<.*>)/i;

Now this matches a '<' followed by zero or more arbitrary characters ('.' matches any character), followed by '>'.

ahans 2010-01-18 06:56:05

I shouldn't have been too quick with marking this up. The tutorial has both `<.*>` and `<*>`. The answer is that `<*` does not match anything so just the `>` matches and that is all that is returned.

Kevin Brock 2010-01-18 08:00:46

That is exactly what I've said and what's also said by jspcal. I don't know what's in the tutorial though. Also, I just reread the question and realize that the author expected the first regex not to match the complete email string but was surprised about the result. While most answers (including mine) explain the result of that query, they leave out the second question except nil's answer which I believe is the best so far.

ahans 2010-01-18 08:34:40

Answer 3

+1 A:

Regular expressions in Perl work a bit differently than wildcards in many OS applications.

The * means "0 or more of the previous thing". So when you do

<*>

IT means

"Zero or more less than characters, followed by a greater than character."

What you want is the regular expression user's best friend: .

<.*>

That means

"a less than character, followed by ANYTHING 0 or more times, followed by a greater than character."

But that's probably not what you mean either: the > character is also "any character"! Fortunately, there's an easy way of saying what you really mean you make * no longer greedy with the ? character:

<.*?>

This means, "The less than character, followed by anything, 0 or more times, UNTIL I reach a > character."

Woo!

There's a few great websites out there that will get you familiar with the great world of regexes, and one of my favorite is regular expressions.info. For perl specific regexes, though, you can't beat the classic Perl Regular Expressions Tutorial. The perl regular expressions tutorial has guided many a regex wanderers to the Perl homeland, and is a great resource.

Robert P 2010-01-18 06:58:24

The tutorial has both `<.*>` and `<*>` and discusses the difference between those two.

Kevin Brock 2010-01-18 07:58:54

Answer 4

+3 A:

The answer to your first question is simple, you're wrong.

The second question is rather interesting, to understand this you need to know two facts:

Once there's a successful match, the regular expression will stop matching and return the result it believes successful.
The standard quantifiers (* + ? and {min, max}) are greedy. which means, /<*/ will match as much <<<<<... as possible.

So, back to the regex /<*/. When matching

My email address is <[email protected]>.

The very beginning of the string, ^, matches the regex, which results an empty string. This is a successful match, and the next step, ^M, does not match your regex. so voila, perl will stop matching and give your the empty result.

Then come to second string

<My email address is <[email protected]>.

The very beginning of the string, ^, matches the regex, which results an empty string. But, the next step, ^<, still matches your regex. and quntifier * is greedy. It will match as much as possible. So results in an <.

nil 2010-01-18 07:20:28

Answer 5

A:

Personally I'm very fond of the cheat sheet at Added Bytes.

Anders 2010-01-18 08:31:01

ansaurus

tags:

views:

answers:

Why doesn't my Perl regex match what I think it should?

related questions