ansaurus

Question

Regexp Question - Negating a captured character

Answer 1

+1 A:

Sure:

'([^']*)'|"([^"]*)"

On a successful match, the $+ variable will hold the contents of whichever alternate matched.

Sean 2010-08-25 23:01:19

Answer 2

+7 A:

This is actually much simpler than you may have realized. You don't really need the negative look-ahead. What you want to do is a non-greedy (or lazy) match like this:

(['"]).*?\1

The ? character after the .* is the important part. It says, consume the minimum possible characters before hitting the next part of the regex. So, you get either kind of quote, and then you go after 0-M characters until you encounter a character matching whichever quote you first ran into. You can learn more about greedy matching vs. non-greedy here and here.

mattmc3 2010-08-25 23:06:46

thank you! this is what I was looking for. totally forgot about lazy quantifiers. well now I feel stupid

Sean Nilan 2010-08-25 23:08:44

No need to feel bad - regex's are powerful, but complicated. It's hard to keep it all in your head. That's what SO.com is for.

mattmc3 2010-08-25 23:13:40

The regex can be slightly improved by removing the expensive match all `.` by using - `(['"])[^\1]*?\1`

Peter Ajtai 2010-08-25 23:35:31

@Peter Ajtai, no it can't; backreferences aren't allowed in character classes. That class gives you any character but \001 aka chr(1).

ysth 2010-08-26 00:00:04

@ysth - Whoops. I just realized that. Thanks for the clarification.

Peter Ajtai 2010-08-26 00:04:14

Answer 3

+1 A:

In the general case, regexps are not really the answer. You might be interested in something like Text::ParseWords, which tokenizes text, accounting for nested quotes, backslashed quotes, backslashed spaces, and other oddities.

Ryan Thompson 2010-08-25 23:29:48

ansaurus

tags:

views:

answers:

Regexp Question - Negating a captured character

related questions