ansaurus

Question

Answer 1

+2 A:

You are getting a match on software.informer.com. Check the value of $&. The return of scan is an array of the captured groups. Add capturing parentheses around the suffix, and you'll get the .com as part of the return value from scan as well.

The regex testers and Ruby are not disagreeing about the fundamental issue (the regex itself). Rather, their interfaces are differing in what they are emphasizing. When you run scan in irb, the first thing you'll see is the return value from scan (an Array of the captured subpatterns), which is not the same thing as the matched text. Regex testers are most likely oriented toward displaying the matched text.

FM 2010-02-15 16:54:10

Hm, I'm new to regex :/... but I still don't get why the regex testers and ruby vary, even the "ruby regex tester" is failing me. Hm, and also I want 1 match, not several. This method gets me more matches...?

Zombies 2010-02-15 16:56:13

Answer 2

+3 A:

Your regex is correct, the result has to do with the way String#scan behaves. From the official documentation:

"If the pattern contains groups, each individual result is itself an array containing one entry per group."

Basically, if you put parentheses around the whole regex, the first element of each array in your results will be what you expect.

Alex Reisner 2010-02-15 16:57:51

Interesting... but to me, parenthess seems unavoidable, and yet affect the way scan works. Any tips...?

Zombies 2010-02-15 17:00:26

Parentheses here are a little confusing because they have two distinct functions: grouping a sub-expression for repetition, and forming the output of `scan`. One could fix this by introducing another symbol for controlling scan's output, but I think the parentheses usually work pretty well (you often end up with what you want, naturally) and introducing externally-dependent (method-related) symbols into regular expressions does not seem like a good idea.

Alex Reisner 2010-02-15 17:10:03

Answer 3

A:

How about doing this :

/([a-zA-Z0-9\-]*\.*\w{1,4})$/

This returns

informer.com

On your test string.

http://rubular.com/regexes/13670

marcgg 2010-02-15 17:10:46

Answer 4

+2 A:

It does not look as if you expect more than one result (especially as the regex is anchored). In that case there is no reason to use scan.

'Show more results from software.informer.com'[ /([a-zA-Z0-9\-]*\.)*\w{1,4}$/ ]
#=> "software.informer.com"

If you do need to use scan (in which case you obviously need to remove the anchor), you can use (?:) to create non-capturing groups.

'foo.bar.baz lala software.informer.com'.scan( /(?:[a-zA-Z0-9\-]*\.)*\w{1,4}/ )
#=> ["foo.bar.baz", "lala", "software.informer.com"]

sepp2k 2010-02-15 17:12:41

ansaurus

tags:

views:

answers:

Very odd issue with Ruby and regex

related questions