tags:

views:

432

answers:

1

I have two files where I want to perform union operation based on 1st column:

file1.txt

foo 1
bar 2
qux 3

file2.txt

foo x
qux y
boo z

The result I hope to get is like this:

foo  1 x
bar  2 -
qux  3 y
boo  - z

where the empty fields of column 1 is padded with "-".

But why this join command doesn't work as I expected?

$ join -a1 -a2 -e"-" file1.txt file2.txt

What's the right way to do it?

+2  A: 

"Important: FILE1 and FILE2 must be sorted on the join fields." (from this online manpage).

This problem #1. Problem #2 is worse: option -e is badly documented -- only works in conjunction with -o, so for example:

$ join -a 1 -a 2 -e'-' -o '0,1.2,2.2' sfile1.txt sfile2.txt
bar 2 -
boo - z
foo 1 x
qux 3 y

where the s prefix name indicated files that I've sorted beforehand.

Edit: man join explains the -o switch (so does the online manpage I point to above). It specifies the fields to output (1.2 means 2nd field from file 1, &c), or 0 to mean the join field, and is a comma-separated list. (I didn't remember the 0 value, actually, so had originally given a clumsier solution requiring awk post-processing, but the current solution is better... and no awk needed!).

Alex Martelli
@Alex: thanks so much. I don't understand the option "-o '1.1,2.1,1.2,2.2'". Where can I find info about that? Also if you don't mind the awk please?
neversaint
I've edited the answer to explain the -o, but do look at `man awk`: as I did, to check my explanation, I was reminded of the 0 value for field number, which I've now used in the edited example and makes post-processing unneeded. But do remember to sort your files beforehand!-)
Alex Martelli
@Alex, thanks a million.
neversaint