ansaurus

Question

Answer 1

A:

You can do it by cat ./* >outfile

xt.and.r 2010-10-11 06:30:02

No - that does not work. Join finds matching lines in the files based on a key (since no key is specified, on the first column of each file), assuming that the files are all sorted in the same order.

Jonathan Leffler 2010-10-11 06:34:59

Answer 2

+2 A:

#!/bin/bash

data=
for f in "${rpkmDir}"/HS*.chsn.rpkm
do
  if [ ! "$data" ]
  then
    data="$(sort "$f")"
    continue
  fi
  data="$(join <(sort "$f") /dev/stdin <<< "$data")"
done
echo "$data"

Ignacio Vazquez-Abrams 2010-10-11 06:38:26

Do you need to 'echo "$data"' into a pipe into bash? Or explain that you have generated the script and need to execute what you produced as a shell script?

Jonathan Leffler 2010-10-11 06:44:37

It is indeed a script. I had hoped that the shebang line at the top would have made this apparent.

Ignacio Vazquez-Abrams 2010-10-11 06:47:03

This is a script that writes a script - I think. You then have to feed the output of the script shown into the shell. Normally, you just execute a script to ... get the commands executed. Here you have to execute your script and then run bash on the output.

Jonathan Leffler 2010-10-11 06:53:15

Incorrect. It uses command and process substitution to build up the results in `$data`, feeding it back into `join` each iteration.

Ignacio Vazquez-Abrams 2010-10-11 06:55:56

I should have mentioned I didn't need to eliminate unmatched lines, making @ghostdog74's answer most concise for what I need. Still, your answer will be very useful when I need that functionality +1. I wish I could accept two answers.

D W 2010-10-12 12:20:49

*shrug* I merely extended your example to an arbitrary set of files.

Ignacio Vazquez-Abrams 2010-10-12 12:44:35

Answer 3

+1 A:

Since the join (in Classic UNIX and under POSIX) is defined so it works on strictly two files at a time, you are going to have to do the iteration yourself, somehow.

While your notation is marvellously minimal, it is also inscrutable. The chances are that you can use pipes and the fact that '-' as a file name denotes standard input to alter the sequencing, I think. But the hard part is connecting everything together without creating any explicit temporary files. You may be best off simply writing a script that writes your script notation, and feeds that into bash.

Maybe (untested script):

cd ${rpkmDir}
ls HS*.chsn.rpkm |
{
read file
script="sort $file"
while read file
do
    script="$script | join - <(sort $file)"
done
} | bash

Jonathan Leffler 2010-10-11 06:42:44

I wasn't aware of the '-' trick. Thanks +1

D W 2010-10-12 12:23:00

Answer 4

+1 A:

use awk, say you want to join on 1st field

awk '{a[$1]=a[$1] FS $0}END{for(i in a) print i,a[i]}' file*

ghostdog74 2010-10-11 06:48:36

That doesn't eliminate lines where file1 contains the key and file2 does not - whereas the join command (without options) does eliminate unmatched lines.

Jonathan Leffler 2010-10-11 06:50:54

correct me if i am wrong, but i don't see OP stating that requirement. And I already stated in my post my assumption based on an example on the first field. Until OP elaborates on his data format, all solutions will based on wild guesses and assumptions. BTW, its also not that difficult to include code to do what you are assuming.

ghostdog74 2010-10-11 06:56:53

using paste in some way would seem to be better for this application

D W 2010-10-13 20:48:03

you will have to sort as well when using paste

ghostdog74 2010-10-13 23:36:35

that's true, anyways I guess I have to choose Ignacio Vazquez-Abrams' answer because it answers the original question, even though your solution is useful to me.

D W 2010-10-18 20:24:11

ansaurus

tags:

views:

answers:

Join all files in a directory

related questions