




I'm looking for a regular expression that allows for either single-quoted or double-quoted strings, and allows the opposite quote character within the string. For example, the following would both be legal strings: "hello 'there' world" 'hello "there" world'

The regexp I'm using uses negative lookahead and is as follows:


This would work I think, but what about if the language didn't support negative lookahead. Is there any other way to do this? Without alternation?


I know I can use alternation. This was more of just a hypothetical question. Say I had 20 different characters in the initial character class. I wouldn't want to write out 20 different alternations. I'm trying to actually negate the captured character, without using lookahead, lookbehind, or alternation.

+1  A: 



On a successful match, the $+ variable will hold the contents of whichever alternate matched.

+7  A: 

This is actually much simpler than you may have realized. You don't really need the negative look-ahead. What you want to do is a non-greedy (or lazy) match like this:


The ? character after the .* is the important part. It says, consume the minimum possible characters before hitting the next part of the regex. So, you get either kind of quote, and then you go after 0-M characters until you encounter a character matching whichever quote you first ran into. You can learn more about greedy matching vs. non-greedy here and here.

thank you! this is what I was looking for. totally forgot about lazy quantifiers. well now I feel stupid
Sean Nilan
No need to feel bad - regex's are powerful, but complicated. It's hard to keep it all in your head. That's what SO.com is for.
The regex can be slightly improved by removing the expensive match all `.` by using - `(['"])[^\1]*?\1`
Peter Ajtai
@Peter Ajtai, no it can't; backreferences aren't allowed in character classes. That class gives you any character but \001 aka chr(1).
@ysth - Whoops. I just realized that. Thanks for the clarification.
Peter Ajtai
+1  A: 

In the general case, regexps are not really the answer. You might be interested in something like Text::ParseWords, which tokenizes text, accounting for nested quotes, backslashed quotes, backslashed spaces, and other oddities.

Ryan Thompson