views:

441

answers:

4

I'm looking for a SimpleGrepSedPerlOrPythonOneLiner that outputs all quotations in a text.


Example 1:

echo “HAL,” noted Frank, “said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner

stdout:

"HAL,"
"said that everything was going extremely well.”


Example 2:

cat MicrosoftWindowsXPEula.txt | SimpleGrepSedPerlOrPythonOneLiner

stdout:

"EULA"
"Software"
"Workstation Computer"
"Device"
"DRM"

etc.

(link to the corresponding text).

+4  A: 
grep -o "\"[^\"]*\""

This greps for " + anything except a quote, any number of times + "

The -o makes it only output the matched text, not the whole line.

Greg
On Windows '^' must be escaped. `cat eula.txt | grep -o "\"[^^\"]*\""`
J.F. Sebastian
+5  A: 

No regexp solution will work if you have nested quotes, but for your examples this works well

$ echo \"HAL,\" noted Frank, \"said that everything was going extremely well\"  
 | perl -n -e 'while (m/(".*?")/g) { print $1."\n"; }'
"HAL,"
"said that everything was going extremely well"

$ cat eula.txt| perl -n -e 'while (m/(".*?")/g) { print $1."\n"; }'
"EULA"
"online"
"Software"
"Workstation Computer"
"Device"
"multiplexing"
"DRM"
"Secure Content"
"DRM Software"
"Secure Content Owners"
"DRM Upgrades"
"WMFSDK"
"Not For Resale"
"NFR,"
"Academic Edition"
"AE,"
"Qualified Educational User."
"Exclusion of Incidental, Consequential and Certain Other Damages"
"Restricted Rights"
"Exclusion des dommages accessoires, indirects et de certains autres dommages"
"Consumer rights"
Vinko Vrsalovic
On Windows: `cat eula.txt | perl -nE"say $1 while /(\"[^^\"]*\")/g" `
J.F. Sebastian
cat eula.txt | perl -lne 'print for /(".*?")/g'Perl golf FTW! ;)
8jean
Well, some regex engines handle nested quotes, so some regex solutions will work :)
brian d foy
@brian Yes, but I didn't want to get into that, as I was kinda busy and haven't dug deep enough there yet as to explain it properly. :)
Vinko Vrsalovic
+6  A: 

I like this:

perl -ne 'print "$_\n" foreach /"((?>[^"\\]|\\+[^"]|\\(?:\\\\)*")*)"/g;'

It's a little verbose, but it handles escaped quotes and backtracking a lot better than the simplest implementation. What it's saying is:

my $re = qr{
   "               # Begin it with literal quote
   ( 
     (?>           # prevent backtracking once the alternation has been
                   # satisfied. It either agrees or it does not. This expression
                   # only needs one direction, or we fail out of the branch

         [^"\\]    # a character that is not a dquote or a backslash
     |   \\+       # OR if a backslash, then any number of backslashes followed by 
         [^"]      # something that is not a quote
     |   \\        # OR again a backslash
         (?>\\\\)* # followed by any number of *pairs* of backslashes (as units)
         "         # and a quote
     )*            # any number of *set* qualifying phrases
  )                # all batched up together
  "                # Ended by a literal quote
}x;

If you don't need that much power--say it's only likely to be dialog and not structured quotes, then

/"([^"]*)"/

probably works about as well as anything else.

Axeman
A: 
grep -o '"[^"]*"' file

The option '-o' print only pattern