tags:

views:

1988

answers:

5

I have a value like this "Foo Bar" "Another Value" something else

What RegEx expression will return the values enclosed in the quotation marks (e.g. Foo Bar and Another Value)?

+8  A: 

In general, the following regular expression fragment is what you are looking for:

"(.*?)"

This uses the non-greedy *? operator to capture everything up to but not including the next double quote. Then, you use a language-specific mechanism to extract the matched text.

In Python, you could do:

>>> import re
>>> str = '"Foo Bar" "Another Value"'
>>> print re.findall(r'"(.*?)"', str)
['Foo Bar', 'Another Value']
Greg Hewgill
+2  A: 

I would go for:

"([^"]*)"

The [^"] is regex for any character except '"'
The reason I use this over the non greedy many operator is that I have to keep looking that up just to make sure I get it correct.

Martin York
This also behaves well among different regex interpretations.
Phil Bennett
+7  A: 

I've been using the following with great success:

([""'])(?:(?=(\\?))\2.)*?\1

It supports nested quotes as well.

Adam
Could you please provide some explanation as how it reads - this would be very helpful - Thanks
philippe
([""']) match a quote; ((?=(\\?))\2.) if backslash exists, gobble it, and whether or not that happens, match a character; *? match many times (non-greedily, as to not eat the closing quote); \1 match the same quote that was use for opening.
ephemient
A: 
echo 'junk "Foo Bar" not empty one "" this "but this" and this neither' | sed 's/[^\"]*\"\([^\"]*\)\"[^\"]*/>\1</g'

This will result in: >Foo Bar<><>but this<

Here I showed the result string between ><'s for clarity, also using the non-greedy version with this sed command we first throw out the junk before and after that ""'s and then replace this with the part between the ""'s and surround this by ><'s.

amo-ej1
+1  A: 

This version

  • accounts for escaped quotes
  • controls backtracking

    /(["'])((?:[^\\\1]|(?:\\\\)*\\[^\\])*)\1/
    
Axeman