tags:

views:

413

answers:

6

Trying to merge some data that I have. The input would look like so:

foo bar
foo baz boo
abc def
abc ghi

And I would like the output to look like:

foo bar baz boo
abc def ghi

I have some ideas using some arrays in a shell script, but I was looking for a more elegant or quicker solution.

A: 

if the length of the first field is fixed, you can use uniq with the -w option. Otherwise you night want to use awk (warning: untested code):

awk '
    BEGIN{last='';}
    {
        if ($1==last) {
            for (i = 1; i < NF;i++) print $i;
        } else {
            print "\n", $0;
            last = $1;
        }
    }'
soulmerge
The first field's length will vary, and I don't think uniq has the function I am looking for (concatination of the extra fields onto one line)
Kyle
+2  A: 

How about join?

file="file"
join -a1 -a2 <(sort "$file" | sed -n 1~2p) <(sort "$file" | sed -n 2~2p)

The seds there are just splitting the file on odd and even lines

pixelbeat
Getting close! I will have to sort the data before hand (does not work if the records with duplicate first fields aren't next to eachother) and it doesn't spit out fields that don't have duplicates (again, not really a problem, this little command did the hard part of my situation).Thanks!
Kyle
I already did the sort for you. If you want to show uniq items then that's achieved with -a1 -a2 which I just updated the answer with
pixelbeat
I'm still getting some duplicates in my output if I don't sort the file beforehand. But like I said, it's not a problem. And the -a1 -a2 fixed my other issue. (Why have I never used join before?!) Thanks again.
Kyle
Oh right, sorry you need to sort before not after so that the odd/even split, correctly splits duplicate fields to separate files. Answer updated. Also the caveat with join is that it works on _pairs_ so if you have more than 2 of a particular key, then you'll need to run the above in a loop.
pixelbeat
+1  A: 

While pixelbeat's answer works, I can't say I'm very enthused about it. I think I'd use awk something like this:

    { for (i=2; i<=NF; i++) { lines[$1] = lines[$1] " " $i;} }  
END { for (i in lines) printf("%s%s\n", i, lines[i]); }

This shouldn't require pre-sorting the data, and should work fine regardless of the number or length of the fields (short of overflowing memory, of course). Its only obvious shortcoming is that its output is in an arbitrary order. If you need it sorted, you'll need to pipe the output through sort (but getting back to the original order would be something else).

Jerry Coffin
I really like this solution, a bit more streamlined. Thanks!
Kyle
+1  A: 

An awk solution

awk '
    {key=$1; $1=""; x[key] = x[key] $0}
    END {for (key in x) {print key x[key]}}
' filename
glenn jackman
A: 

Pure Bash, for truly alternating lines:

infile="paste.dat"

toggle=0
while read -a line ; do
  if [ $toggle -eq 0 ] ; then
    echo -n "${line[@]}"
  else
    unset line[0]               # remove first element
    echo  " ${line[@]}"
  fi
  ((toggle=1-toggle))
done < "$infile"
fgm
A: 

Based on fgm's pure Bash snippet:

text='
foo bar
foo baz boo
abc def
abc ghi
'

count=0
oneline=""
firstword=""
while IFS=" " read -a line ; do
   let count++
   if [[ $count -eq 1 ]]; then
      firstword="${line[0]}"
      oneline="${line[@]}"
   else
      if [[ "$firstword" == "${line[0]}" ]]; then
         unset line[0] # remove first word of line
         oneline="${oneline} ${line[@]}"
      else
         printf "%s\n" "${oneline}"
         oneline="${line[@]}"
         firstword="${line[0]}"
      fi
  fi
done <<< "$text"
gustaf