views:

63

answers:

4

For example, say I have a text file example.txt that reads: I like dogs. My favorite dog is George because he is my dog. George is a nice dog.

Now how do I extract "George" given that it is the first word that follows "My favorite dog is"?

What if there as more than one space, e.g. My favorite dog is George .....

Is there a way to reliably extract the word "George" regardless of the number of spaces between "My favorite dog is" and "George"?

A: 

You can do:

cat example.txt | perl -pe 's/My favorite dog is\s+(\w+).*/\1/g'

It outputs Geroge

codaddict
Thanks! What if I wanted to extract 105.15088 fromblah blah! HEAT OF FORMATION 105.14088 93.45997 46.89387blah blah
Feynman
You can try `cat input | perl -pe 's/blah blah ! HEAT OF FORMATION\s+(\S+)\s.*/\1/g'`
codaddict
Useless use of `cat` (twice).
Dennis Williamson
+1  A: 

If you do not have perl installed you can use sed:

cat example.txt | sed 's/my favourite dog is *\([a-zA-Z]*\) .*/\1/g'
Kristoffer E
Thanks. Might I ask which is the fastest at doing this--using bash (sed), pearl, or something else?
Feynman
I'm not sure but I would guess that sed is quicker than perl for smaller files due to quicker startup. For larger files I have no idea.
Kristoffer E
@Feynman: `sed` has nothing whatsoever to do with Bash or vice versa other than the fact that `sed` is a program that can be spawned by a shell and Bash is a shell. You could, however, use Bash to do your string extraction (see my answer).
Dennis Williamson
+1  A: 

Pure Bash:

string='blah blah ! HEAT OF FORMATION 105.14088 93.45997 46.89387 blah blah'
pattern='HEAT OF FORMATION ([^[:blank:]]*)'
[[ $string =~ $pattern ]]
match=${BASH_REMATCH[1]}
Dennis Williamson
A: 

If you are trying to search a file, especially if you have a big file, using external tools like sed/awk/perl are faster than using pure bash loops and bash string manipulation.

sed 's/.*HEAT OF FOMATION[ \t]*\(.[^ \t]*\).*/\1/'  file

Pure bash string manipulation are only good when you are processing a few simple strings inside your script. Like manipulating a variable.

ghostdog74