views:

36

answers:

4

How would one write a regex that matches a pattern that can contain quotes, but if it does, must have matching quotes at the beginning and end?

"?(pattern)"?

Will not work because it will allow patterns that begin with a quote but don't end with one.

"(pattern)"|(pattern)

Will work, but is repetitive. Is there a better way to do that without repeating the pattern?

A: 

Depending on the language you're using, you should be able to use backreferences. Something like this, say:

(["'])(pattern)\1|^(pattern)$

That way, you're requiring that either there are no quotes, or that the SAME quote is used on both ends.

zigdon
+2  A: 

You can get a solution without repeating by making use of backreferences and conditionals:

/^(")?(pattern)(?(1)\1|)$/

Matches:

  • pattern
  • "pattern"

Doesn't match:

  • "pattern
  • pattern"

This pattern is somewhat complex, however. It first looks for an optional quote, and puts it into backreference 1 if one is found. Then it searches for your pattern. Then it uses conditional syntax to say "if backreference 1 is found again, match it, otherwise match nothing". The whole pattern is anchored (which means that it needs to appear by itself on a line) so that unmatched quotes won't be captured (otherwise the pattern in pattern" would match).

Note that support for conditionals varies by engine and the more verbose but repetitive expressions will be more widely supported (and likely easier to understand).


Update: A much simpler version of this regex would be /^(")?(pattern)\1$/, which does not need a conditional. When I was testing this initially, the tester I was using gave me a false negative, which lead me to discount it (oops!).

I'll leave the solution with the conditional up for posterity and interest, but this is a simpler version that is more likely to work in a wider variety of engines (backreferences are the only feature being used here which might be unsupported).

Daniel Vandersluis
Aaarrg, I just started to look up regex-if conditional syntax in the Friedel book. You were faster +1 (the next +1 is for the recursive pattern ;-)
rubber boots
@rubber Once upon a time I knew how to do recursive regex but I think I forgot for the good of mankind ;)
Daniel Vandersluis
@Daniel: Mankind probably wouldn't mind if you were to forget about conditionals, too. `^("?)pattern\1$` works just fine. (@wuputah's deleted answer didn't work because it wasn't anchored. And @Tim, possessive quantifiers/atomic groups aren't needed.)
Alan Moore
@Alan perhaps the note in the last paragraph should be clearer but I did not suggest that using a conditional was the best way to do it. In fact, I have never used conditionals in production code. I just thought it'd be an interesting way to solve the problem.
Daniel Vandersluis
@Alan is seems that the regex tester I was using has a bug and was giving me a false negative for `^("?)pattern\1$` which lead to me trying the conditionals solution in the first place... oops.
Daniel Vandersluis
I was only half joking about forgetting conditionals. I learned about them way back when, but then I started working mostly in Java--which has never supported conditionals--and I never missed them. They sound like a great idea, but there's almost always a better way.
Alan Moore
A: 

If using .Net, have a read of this sample chapter in Friedl's classic "Mastering Regular Expressions".

cristobalito
Is there something specific in there he should look for?
Alan Moore
Sure - under the "Advanced .Net" section there's a discussion on "Matching Nested Constructs" in .Net
cristobalito
But in this case we're talking about one pair of quotation marks; nesting is not an issue.
Alan Moore
Fair point, I didn't realise that on my initial read.
cristobalito
A: 

This should work with recursive regex (which needs longer to get right). In the meantime: in Perl, you can build a self-modifying regex. I'll leave that as an academic example ;-)

my @stuff = ( '"pattern"', 'pattern', 'pattern"', '"pattern'  );

foreach (@stuff) {
   print "$_ OK\n" if /^
                        (")?
                        \w+
                        (??{defined $1 ? '"' : ''})
                       $
                      /x
}

Result:

"pattern" OK
pattern OK
rubber boots