Update: The question has been edited to state that all files contain all keys, so the accepted answer (join
) is definitely better than this one. Only consider using this one if it's possible the keys may not be in all files.
If you're not too concerned about performance, you could try the quick and dirty:
$ cat file_a
ID5 Value5a
ID1 Value1a
ID3 Value3a
ID4 Value4a
ID2 Value2a
$ cat file_b
ID1 Value1b
ID3 Value3b
$ cat file_c
ID2 Value2c
ID3 Value3c
ID4 Value4c
ID5 Value5c
$ cat qq.sh
#!/bin/bash
keylist=$(awk '{print $1'} file_[abc] | sort | uniq)
for key in ${keylist} ; do
val_a=$(grep "^${key} " file_a | awk '{print $2}') ; val_a=${val_a:--}
val_b=$(grep "^${key} " file_b | awk '{print $2}') ; val_b=${val_b:--}
val_c=$(grep "^${key} " file_c | awk '{print $2}') ; val_c=${val_c:--}
echo ${key} ${val_a} ${val_b} ${val_c}
done
$ ./qq.sh
ID1 Value1a Value1b -
ID2 Value2a - Value2c
ID3 Value3a Value3b Value3c
ID4 Value4a - Value4c
ID5 Value5a - Value5c
This actually works out the keys first then gets the values from each file with that key, or -
if it's not in the relevant file.
The grep
commands will need to be adjusted if the file is more complex (either if field 1 isn't at the start of the line or is followed by a non-space separator) but this should be a reasonable first-cut solution. The likely grep
to use in that case would be:
grep "^[ X]*${key}[ X]"
where X
is actually the tab character, as this allows for zero-or-more spaces or tabs before the key and a space or tab to terminate the key.
If the files are particularly large, you may want to look into using the associative arrays within awk
but, since there's no indication of the size, I'd start with this one until you get to the point where it's running too slow.