tags:

views:

68

answers:

2

The language is Ruby, here is my irb session

expr = /\Aselect from (\D+)(?: (?:where|&&) (\D+) (\S+) (\S+))*(?: order by (\D+) (asc|desc))?\Z/
=> /\Aselect from (\D+)(?: (?:where|&&) (\D+) (\S+) (\S+))*(?: order by (\D+) (asc|desc))?\Z/

/> str = "select from Entity order by value desc"
=> "select from Entity order by value desc"

/> expr =~ str
=> 0

/> $1
=> "Entity order by value desc"

/> $2
=> nil

I just don't understand why I am getting "Entity order by value desc" as $1. The desired behavior here would be to get $1 => "Entity", $2 => "value", $3 => "desc". What am I doing wrong? How do I modify this regular expression so I get these results?

Thank you

+4  A: 

\D is "non digit", which covers the whitespace between the words, as well as the following words. Try (\S+) instead.

[Edit] Sorry, I missed the question at the end. The above answers the 'why does this happen?', but not 'how do I achieve what I wanted?'. Here's one way, bypassing any other clauses with a .*

/\Aselect from (\S+).*(?:order by (\S+) (asc|desc)?)?\Z/

Since SQL is pretty free with spacing and such between keywords, you might like to make it more unreadable and use \s+ instead of literal spaces. That is, the expression as-is wouldn't match:

"select   from     Fred"

but it would if you did /\Aselect\s+from\s+....

JimG
+1  A: 

The (\D+) is greedy and has eaten the rest of the string. Since everything else in your expression is optional (* or ?) there is no need to match it for the expression to succeed.

My suggestion is to make your matches less greedy. eg (\D+?) will match and capture any non-digit one or more times, but as few times as needed to make a successful match.

John F. Miller