views:

80

answers:

4

i have three similar files, they are all like this:

File A

ID1 Value1a
ID2 Value2a
  .
  .
  .
IDN Value2n

and i want an output like this

Output

ID1 Value1a Value1b Value1c
ID2 Value2a Value2b Value2c
.....
IDN ValueNa ValueNb ValueNc

Looking to the first line, i want value1A to be the value of id1 in fileA, value1B the value of id1 in fileB, and so on which each field and each line. I thougth it like a sql join. I've tried several things but none of them where even close.

EDIT: All files have the same length and ids.

+3  A: 

Give join(1) a try:

join fileA fileB | join - fileC
Dennis Williamson
If only I knew that existed, I wouldn't have wasted 10 minutes looking up awk syntax ;-)
David Zaslavsky
sorry! :$, but you have discovered a great tool!!!thanks for the time.
noinflection
Just be careful - this won't include lines that are not in _all_ of the files being joined. If you're sure all keys exist in all files, it's fine.
paxdiablo
I'm going to mention in the question that each file have the same length and same ids.
noinflection
No problems, @noinflection. I'll leave my answer for the more general case but Dennis' is definitely better (i.e., smaller and faster) for your edited requirements.
paxdiablo
+2  A: 

Update: The question has been edited to state that all files contain all keys, so the accepted answer (join) is definitely better than this one. Only consider using this one if it's possible the keys may not be in all files.


If you're not too concerned about performance, you could try the quick and dirty:

$ cat file_a
    ID5 Value5a
    ID1 Value1a
    ID3 Value3a
    ID4 Value4a
    ID2 Value2a
$ cat file_b
    ID1 Value1b
    ID3 Value3b
$ cat file_c
    ID2 Value2c
    ID3 Value3c
    ID4 Value4c
    ID5 Value5c
$ cat qq.sh
    #!/bin/bash
    keylist=$(awk '{print $1'} file_[abc] | sort | uniq)
    for key in ${keylist} ; do
        val_a=$(grep "^${key} " file_a | awk '{print $2}') ; val_a=${val_a:--}
        val_b=$(grep "^${key} " file_b | awk '{print $2}') ; val_b=${val_b:--}
        val_c=$(grep "^${key} " file_c | awk '{print $2}') ; val_c=${val_c:--}
        echo ${key} ${val_a} ${val_b} ${val_c}
    done
$ ./qq.sh
    ID1 Value1a Value1b -
    ID2 Value2a - Value2c
    ID3 Value3a Value3b Value3c
    ID4 Value4a - Value4c
    ID5 Value5a - Value5c

This actually works out the keys first then gets the values from each file with that key, or - if it's not in the relevant file.

The grep commands will need to be adjusted if the file is more complex (either if field 1 isn't at the start of the line or is followed by a non-space separator) but this should be a reasonable first-cut solution. The likely grep to use in that case would be:

grep "^[ X]*${key}[ X]"

where X is actually the tab character, as this allows for zero-or-more spaces or tabs before the key and a space or tab to terminate the key.

If the files are particularly large, you may want to look into using the associative arrays within awk but, since there's no indication of the size, I'd start with this one until you get to the point where it's running too slow.

paxdiablo
Thx for the time.
noinflection
+1  A: 

join (Dennis's answer) is better, but just for kicks, here's what I came up with in awk:

awk '{a=$0; getline b <"fileB"; getline c <"fileC"; $0=a" "b" "c; print $1,$2,$4,$6}' <fileA
David Zaslavsky
Thank you for the time. but dennis' answer is really good.
noinflection
Yep, that's what I said...
David Zaslavsky
A: 

Just to add that in order for join to work the input should be sorted. This awk solution should handle any number of input files. You will also loose the original order of the keys (you'll need more code to preserve it).

awk 'END {
  for (K in k) print K, k[K]
    }
{ 
  k[$1] = k[$1] ? k[$1] FS $2 : $2 
  }' file1 file2 [.. filen]
radoulov
Thank you very much! late but sure!
noinflection