tags:

views:

1977

answers:

6

Is there a way to make grep output "words" from files that match the search expression?

If I want to find all the instances of, say, "th" in a number of files you do;

grep "th" *

but the output will be something like (bold is by me);

some-text-file : the cat sat on the mat
some-other-text-file : the quick brown fox
yet-another-text-file : i hope this explains it thoroughly

What I want it to output, using the same search, is;

the
the
the
this
thoroughly

Is it possible? Or using another/combination of tools?

Thanks,

Neil

+6  A: 
Just awk...no need combination of tools

# awk '{for(i=1;i<=NF;i++){if($i~/^th/){print $i}}}' file
the
the
the
this
thoroughly
ghostdog74
Yuo: grep is just the wrong tool for this job.
dmckee
+7  A: 

Try grep -o

grep -oh "\w*th\w*" *

Edit: matching from Phil's comment

Dan Midwood
That doesn't work.
Kinopiko
I've been upvoted but I just realised it doesn't work. Maybe some regex will do it. This only outputs a "th" for each match.
Dan Midwood
Oh, right. You just need to match all the word-constituent characters on either side: grep -o "\w*th\w*"
Phil
the words 'another' and 'other' appears in your output due to \w* in front of th.
ghostdog74
@ghostdog74 I assumed the another and other were the filenames, not the content of a file.
Dan Midwood
That works now.
Kinopiko
I think that other and another should appear if they are in the input text (which, if I understand correctly, they are not).
dmckee
@Dan, as you can see from OP's sample output, "another" and "other" doesn't appear. Because you have grepped \w*th, the \w* in front of "th" would grab these 2 words as well...
ghostdog74
@ghostdog74 If the input contains another and other then they will be included in the output. It looks from the Q that "some-other-text-file" and "yet-another-text-file" are file names and the question is about matching in multiple files.
Dan Midwood
@Dan , ah i see... my bad for misinterpreting the output.
ghostdog74
This worked for me - thank you!
Neil Baldwin
A: 

You could pipe your grep output into Perl like this:

grep "th" * | perl -n -e'while(/(\w*th\w*)/g) {print "$1\n"}'
Kinopiko
that won't give the correct result. also, if using Perl, no need to use grep. do everything in Perl.
ghostdog74
Thanks for pointing out the error, ghostdog74. I have changed it to print all the words on the line, not just the first.
Kinopiko
like i said, grep is not necessary. perl -n -e'while(/(\s+th\w*)/g) {print "$1\n"}' file
ghostdog74
I don't think it's important here to avoid using grep.
Kinopiko
up to you. i am just illustrating a point. If its not necessary, don't do it. that extra "|" will cost you one process more.
ghostdog74
OK, thanks for your comments.
Kinopiko
+3  A: 

You could translate spaces to newlines and then grep, e.g.:

cat * | tr ' ' '\n' | grep th
Adam Rosenfield
Nice. I should have thought of that.
dmckee
no need cat. tr ' ' '\n' < file | grep th. Slow for big files.
ghostdog74
This didn't work. The output still contained the filename and the entire line from the file that contained the match.Anyway, one of the other solutions offered worked.Thanks for the input though.
Neil Baldwin
@ghostdog74: good point, although if you have more than file, you'll need to use cat. @Neil Baldwin: are you sure you typed it in right? When there's only one input file (stdin in this case), grep doesn't print the filename.
Adam Rosenfield
@Adam - yes, sorry Adam, it does work with one file but not multiple.
Neil Baldwin
@Neil Baldwin: just list all of your files as parameters to cat, it works fine with multiple files
Adam Rosenfield
@Adam - so where you've got 'file' in the example, I would just put 'file1 file2 file3' etc. ?
Neil Baldwin
A: 

You can olso try pcregrep. There is also -w option in grep but in some cases it doesn't work as expected: (from wikipedia)

cat fruitlist.txt
apple
apples
pineapple
apple-
apple-fruit
fruit-apple

grep -w apple fruitlist.txt
apple
apple-
apple-fruit
fruit-apple
Maciek Sawicki
A: 

cat *-text-file | grep -Eio "th[a-z]+"

Mumbling Mac