views:

38

answers:

1

I've written a regex to split a search string into its component parts. Features include:

  • Operators: +, -, AND, OR
  • Word grouping by quotes (single and double for now)
  • Correctly ignoring apostrophes

So:

((?<=^|\s)(?:[\+\-]?"[^"]+"(?=\s|$)|[\+\-]?'[^']+'(?=\s|$)|[\+\-]?\S+|AND|and|OR|or)(?=$|\s))

What is the easiest way to exclude the delimiter quotes from the result matches? Example:

lsdkjflws's ldkj and "lfldkfjs's ldkjfls" lskdj

results in these pieces:

  • lsdkjflws's
  • ldkj
  • and
  • "lfldkfjs's ldkjfls"
  • lskdj

I don't need to do this, I'd just like to accomplish one more step in the regex.

A: 

What engine? If it supports negative and positive lookahead, it's easy:

Instead of these:

"[^"]+"

You would use something like this:

(?<=")[^"]+(?=")

This then excludes the quotes from the match, but still only matches the content of the quotes. I hope this is what you're after.

Lucero
PCRE in PHP. I decided that since I need the optional leading -/+, I have to strip the quotes in code.
Marty Vance