Very often I want to join two ascii-files, which are both tables in the sense that they consist of columns separated by tab, like this:
file 1
FRUIT ID
apple alpha
banana beta
cherry gamma
file 2
ID FOOBAR
alpha cat
beta dog
delta airplane
and I want to join them like this with an inner join:
FRUIT ID FOOBAR
apple alpha cat
banana beta dog
or with a left join:
FRUIT ID FOOBAR
apple alpha cat
banana beta dog
cherry gamma n/a
(The identifiers used for joining are not necessarily unique.)
What I am doing so far is:
- Make copies of the input files without header.
- Sort the input files by column.
- Use the linux join command on the sorted versions.
- Delete intermediate files.
This is error prone as I need to count the columns to specify them later to "sort" and "join" by number (even more error prone with lots of columns and very broad columns), I must not forget to specify that tab is the delimiter and need to remove/insert/fix the header each time etc.
Can anyone recommend a much simpler way? Prefereably where I don't need to sort and where I can specify the column by name, not number? Something like "joincommand ID file1 file2 > result"?