tags:

views:

192

answers:

3

Hello! I was wondering if, with egrep ((GNU grep) 2.5.1), I can select a part of the matched text, something like:

grep '^([a-zA-Z.-]+)[0-9]+' ./file.txt

So I get only the part which matched, between the brackets, something like

house.com

Instead of the whole line like I usually get:

house.com112

Assuming I have a line with house.com112 in my file.txt.

(Actually this regular expression is just an example I just want to know if I can print only a part of the whole line.)

I do know in some languages, such as PHP, Perl or even AWK I can, but I do not know if I can with egrep.

Thank you in advance!

+1  A: 

The first part of your regex is more general than the second half, and since + is greedy, the second [0-9]+ will never match anything only match the last digit (thanks Paul). If you can make your first half more specific (e.g. if you know it will end in a TLD) you could do it.

There's an amazingly cool tool called ack which is basically grep with perl regexs. I'm not sure if it's possible to use in your case, but if you can do what you want in perl, you can do it with ack.

Edit:

Why not just drop the end of the regex? Are there false positives if you do that? If you, you could pipe the results to egrep again with the first half of the regex only.

This seems to be what you are asking about: Also, on the off chance that you don't know about it, the -o flag will output only the matched portion of a given line.

David Kanarek
Oh, yes you are right, It was a fool example. I'm going to change now. Edit: I have already changed, something like that.
Polar Geek
+1 for ack, although [0-9]+ has to match 1 character at least, as im sure you realise.
Paul Creasey
@Paul, yeah, silly mistake on my part.
David Kanarek
+3  A: 

Use sed to modify the result after grep has found the lines that match:

grep '^[a-zA-Z.-]+[0-9]+' ./file.txt | sed 's/[0-9]\+$//'

Or if you want to stick with only grep, you can use grep with the -o switch instead of sed:

grep '^[a-zA-Z.-]+[0-9]+' ./file.txt | grep -o '[a-zA-Z.-]+'
Mark Byers
Ok, thank you everyone, this worked, but all of you were right, I had to process the output of grep.
Polar Geek
+1  A: 

you might want to try the -o, -w flags in grep. egrep is "deprecated" , so use grep -E.

$ echo "test house.com house.com112"| grep -Eow "house.com"
house.com

The basic idea is to go through each word and test for equality.

$ echo "test house.com house.com112"| awk '{for(i=1;i<=NF;i++){ if($i=="house.com") print $i}}'
house.com
ghostdog74