views:

252

answers:

2

I'm having some issues with a regular expression I'm creating.

I need a regex to match against the following examples and then sub match on the first quoted string:

Input strings

("Lorem ipsum dolor sit amet, consectetur adipiscing elit.")

('Lorem ipsum dolor sit amet, consectetur adipiscing elit. ')

('Lorem ipsum dolor sit amet, consectetur adipiscing elit. ', 'arg1', "arg2")

Must sub match

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Regex so far:

\((["'])([^"']+)\1,?.*\)

The regex does a sub match on the text between the first set of quotes and returns the sub match displayed above.

This is almost working perfectly, but the problem I have is that if the quoted string contains quotes in the text the sub match stops at the first instance, see below:

Failing input strings

("Lorem ipsum dolor \"sit\" amet, consectetur adipiscing elit.")

Only sub matches: Lorem ipsum dolor

("Lorem ipsum dolor 'sit' amet, consectetur adipiscing elit.")

The entire match fails.

Notes

The input strings are actually php code function calls. I'm writing a script that will scan .php source files for a specific function and grab the text from the first parameter.

A: 

make sure to not match a quote when it is escaped (has a backslash before it):

/\((["'])([^"']+)[^\\]\1,?.*?\)/
knittl
+2  A: 

Try this regular expression:

\(\s*(?:"(?:[^"\\]+|\\.)*"|'(?:[^'\\]+|\\.)*')(?:\s*,\s*(?:"(?:[^"\\]+|\\.)*"|'(?:[^'\\]+|\\.)*'))*\s*\)

Some explanation:

  • \(\s\* matches the opening parenthesis and optional whitespace.
  • (?:"(?:[^"\\]+|\\.)*"|'(?:[^'\\]+|\\.)*') is to match any quoted string allowing the quote character only when escaped with \.
  • (?:\s*,\s*(?:"(?:[^"\\]+|\\.)*"|'(?:[^'\\]+|\\.)*'))* describes zero or more quotes strings, preceded by a , that may be preceded and followed by whitespace.
  • \s*\) matches the closing parenthesis with optional whitespace.
Gumbo
Can't get this to work fully. I'm getting an error regarding a missing paren at position 46?
Camsoft
@Camsoft: Fixed that.
Gumbo