ansaurus

Question

Answer 1

+3 A:

assuming your dataset is in a file

$ cat dataset
Terminator (19XX) action
The Ghostrider (2009) supernatural

$ awk -F"[()]" '{print $1}' dataset
Terminator
The Ghostrider

$ awk -F"[()]" '{print $1}' dataset > movie_names

$ grep -f movie_names secondfile
$ grep -f secondfile movie_names

Of course, you can do it with just awk as well

awk -F"[()]" 'FNR==NR { m[++d]=$1;next } { for(i=1;i<=d;i++){if( $0 ~ m[i] ){ print }}}' dataset secondfile

ghostdog74 2010-10-18 02:50:35

that is great! Did not know that -F accepts regular expressions. You can combine this in one command line as "awk -F"[()]" '{print $1}' dataset | fgrep -f - secondfile. This way, you dont need the temporary file movie_names.

raja kolluru 2010-10-18 02:54:59

Thanks for the answer, this does exactly what I needed.@raha I will have to try that oneliner, looks like it would work nicely

Isawpalmetto 2010-10-18 12:26:31

Answer 2

A:

You can ask sed to remove the year field and everything that comes after it.

$ cat file | sed 's/([0-9]\+).*//'

This will only return the name of the movie on each line. You can then pipe it into a while read; loop.

If needed you can refine the regex so that it only matches on 4 digits (this one will match any number of digits between parens).

Jean 2010-10-18 03:17:10

ansaurus

tags:

views:

answers:

print everything up to match in pattern

related questions