views:

61

answers:

1

I'm implementing a search in my website, and would like to support searching for exact phrases. I want to end up with an array of terms to search for; here are some examples:

"foobar \"your mom\" bar foo" => ["foobar", "your mom", "bar", "foo"]

"ruby rails'test course''test lesson'asdf" => ["ruby", "rails", "test course", "test lesson", "asdf"]

Notice that there doesn't necessarily have to be a space before or after the quotes.

I'm not well versed in regular expressions, and it seems unnecessary to try to split it repeatedly on single characters. Can anybody help me out? Thanks.'

+1  A: 

You want to use this regular expression (see on rubular.com):

/"[^"]*"|'[^']*'|[^"'\s]+/

This regex matches the tokens instead of the delimiters, so you'd want to use scan instead of split.

The […] construct is called a character class. [^"] is "anything but the double quote".

There are essentially 3 alternates:

  • "[^"]*" - double quoted token (may include spaces and single quotes)
  • '[^']*' - single quoted token (may include spaces and double quotes)
  • [^"'\s]+ - a token consisting of one or more of anything but quotes and whitespaces

References


Snippet

Here's a Ruby implementation:

s = %_foobar "your mom"bar'test course''test lesson'asdf_
puts s

puts s.scan(/"[^"]*"|'[^']*'|[^"'\s]+/)

The above prints (as seen on ideone.com):

foobar "your mom"bar'test course''test lesson'asdf
foobar
"your mom"
bar
'test course'
'test lesson'
asdf

See also

polygenelubricants
Just tried this, and it simply returned an empty array...
davidcelis
@davidcelis: you want to use this with `scan`, not `split`. I'll revise the answer shortly (since you also changed the problem statement).
polygenelubricants
Sorry about that. I realized that my second example may have not been clear enough. Thank you for helping!
davidcelis
Just gave that regex a shot with scan, and changing it to `/['"][^\['"\]]*['"]|[^\['"\]]+/` to allow single quotes as well; as far as I can tell, it's working beautifully.
davidcelis
@davidcelis: see my latest revision; tell me if there's anything else I can do. Also, please upvote if my answer is useful.
polygenelubricants