tags:

views:

44

answers:

3

Dear all,

I have two files, in which the first column of them might have same values. I would like to match the first column of both files, and print the lines in FILE1 for which there was a match.

FILE1:
xxx1 yyy yyy yyy
xxx2 yyy yyy yyy
xxx3 yyy yyy yyy

FILE2:
xxx3 zzzz
xxx4 zzzz

OUTPUT:
xxx3 yyy yyy yyy

Any suggestions are welcomed.

Best wishes

+3  A: 

join

msw
Or, a little more explicitly: `join FILE1 FILE2`
Dennis Williamson
One important note - both files have to be sorted.
depesz
A: 
awk 'FNR==NR{ a[$1]=$0;next } ($1 in a)' file2 file1
ghostdog74
@ghostdog....could you please explain how this works.
Vijay Sarathi
`FNR==NR` is an awk "idiom" to say get the records from the first file in the input arguments to awk. use an associative array to collect the first column as key. the values are the whole record. Then when awk process the 2nd file, it will compare the first field with the keys in array "a" and if it exists, print out the value
ghostdog74
A: 

Here is my recipe:

awk 'key[$1]; FNR==NR {key[$1]=1}' file2 file1

I assume that both lists are sorted by the key (first column) and each key only appears once in a file. The first pattern short for:

key[$1] != 0

In which case, the default action is to print the whole line. This pattern implicitly works for the second file (file1) only because for the first file, the key has not been marked. In the second pattern:

FNR==NR {key[$1]=1}

The FNR==NR means we are processing the first file (file2 in this case). In this case, we mark the key for later reference.

Hai Vu