ansaurus

Question

Linux Grep - how to display only string that matched the regular expression

Answer 1

A:

grep -o or --only-matching

outputs only the matching text instead of complete lines but the problem could be your regex that's not restrictive or greedy enough and actually matches the whole file.

chocolate_jesus 2010-08-06 12:37:35

now the type of words i want are present like this in the filetype="MAIL_ABC_CDE"type="MAIL_XXX_AAA_AAA"etcthere can be any number of _'sWHat should be the reg exp i shoudl use? any idea on that?

AJ 2010-08-06 12:42:49

Answer 2

+1 A:

First of all, with GNU grep that is installed with Ubuntu, -G flag (use basic regexp) is the default, so you can omit it, but, even better, use extended regexp with -E.

-r flag means recursive search within files of a directory, this is what you need.

And, you are right to use -o flag to print matching part of a line. Also, to omit file names you will need a -h flag.

The only mistake you made is the regular expression itself. You missed character specification before *. Your command should look like this:

grep -Ehro 'MAIL_[^[:space:]]*' .

Sample output (not recursive):

$ echo "Some garbage MAIL_OPTION comes MAIL_VALUE here" | grep -Eho 'MAIL_[^[:space:]]*'
MAIL_OPTION
MAIL_VALUE

thor 2010-08-06 12:41:54

great..that works, but one quick questionhow do i do if i know the MAIL_* stuff are either present astype="MAIL_*" or >MAIL_*< in the files?any help on that one?

AJ 2010-08-06 12:48:33

I don't get it. Could you rephrase your question?You want to see surrounding characters around your MAIL_XXX stuff?Like, you want to see " and <> in output of grep command?

thor 2010-08-06 12:51:22

if your MAIL_* could only contain alphabetic characters (a-z), then you can change regexp to 'MAIL_[[:alpha:]]*'

thor 2010-08-06 13:02:12

Answer 3

+1 A:

Try the following command

grep -Eo 'MAIL_[[:alnum:]_]*'

banx 2010-08-06 12:57:42

Answer 4

A:

From your comment to Thor's answer it seems you also want to distinguish if the MAIL_.* text is a text node or an attribute, not just to isolate it whenever it appears in the XML document. Grep cannot parse XML, you need a proper XML parser for that.

A command line xml parser is xmlstarlet. It is packaged in Ubuntu.

Using it on this example file example file:

$ cat test.xml 
<some_root>
    <test a="MAIL_as_attribute">will be printed if you want matching attributes</test>
    <bar>MAIL_as_text will be printed if you want matching text nodes</bar>
    <MAIL_will_not_be_printed>abc</MAIL_will_not_be_printed>
</some_root>

For selecting text nodes you can use:

$ xmlstarlet sel -t -m '//*' -v 'text()' -n test.xml | grep -Eo 'MAIL_[^[:space:]]*'
MAIL_as_text

And for selecting attributes:

$ xmlstarlet sel -t -m '//*[@*]' -v '@*' -n test.xml | grep -Eo 'MAIL_[^[:space:]]*'
MAIL_as_attribute

Brief explanations:

//* is an XPath expression that selects all elements in the document and text() outputs the value of their children text nodes, therefore everything except text nodes gets filtered out
//*[@*] is an XPath expression that selects all attributes in the document and then @* outputs their value

Catalin Iacob 2010-08-06 21:47:47

ansaurus

tags:

views:

answers:

Linux Grep - how to display only string that matched the regular expression

related questions