




How can I use shell one-liners and common GNU tools to concatenate lines in two files as in Cartesian product? What is the most succinct, beautiful and "linuxy" way?

For example, if I have two files:

$ cat file1
$ cat file2

The result should be

a, c
a, d
a, e
b, c
b, d
b, e
Edit: Oops... Sorry, I thought this was tagged python...

If you have python 2.6:

from itertools import product
print('\n'.join((', '.join(elt) for elt in (product(*((line.strip() for line in fh) for fh in (open('file1','r'), open('file2','r'))))))))

a, c
a, d
a, e
b, c
b, d
b, e

If you have python pre-2.6:

def product(*args, **kwds):
    Source: http://docs.python.org/library/itertools.html#itertools.product
    # product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
    # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
    pools = map(tuple, args) * kwds.get('repeat', 1)
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)
print('\n'.join((', '.join(elt) for elt in (product(*((line.strip() for line in fh) for fh in (open('file1','r'), open('file2','r'))))))))
That would work, but python is not what I've been asking for.
Solution 1:

perl -e '{use File::Slurp; @f1 = read_file("file1"); @f2 = read_file("file2"); map { chomp; $v1 = $_; map { print "$v1,$_"; } @f2 } @f1;}'

Here's shell script to do it

while read a; do while read b; do echo "$a, $b"; done < file2; done < file1

Though that will be quite slow. I can't think of any precompiled logic to accomplish this. The next step for speed would be to do the above in awk/perl.

awk 'NR==FNR { a[$0]; next } { for (i in a) print i",", $0 }' file1 file2

Hmm, how about this hacky solution to use precompiled logic?

paste -d, <(sed -n "$(yes 'p;' | head -n $(wc -l < file2))" file1) \
          <(cat $(yes 'file2' | head -n $(wc -l < file1)))
@Pixelbeat: your first version needs to reverse the order of `file1` and `file2`. (That is, it should be `done < file2; done < file 1` to get the desired result.
@Telemachus , the order is irrelevant: if I say "Cartesian product", I really *mean it*.
Pavel Shved
The mechanical way to do it in shell, not using Perl or Python, is:

while read line1
    while read line2
    do echo "$line1, $line2"
    done < file2
done < file1

The join command can sometimes be used for these operations - however, I'm not clear that it can do cartesian product as a degenerate case.

One step up from the double loop would be:

while read line1
    sed "s/^/$line1, /" file2
done < file1
I'd go for the first solution because it doesn't make the files look like they're substantially different.
It (the first solution) would likely be substantially slower - but it would also be immune to odd characters (such as slashes) in the data. Fixing things so that is not a problem is a bit fiddlier, and at that point you start thinking about using Perl or Python after all.
@Pavel - thanks for the editorial assist.
nice. but i sure would not want to maintain this script. :)
Truly delightful, but unmaintainable. :)
awk 'FNR==NR{ a[++d]=$1; next}
  for ( i=1;i<=d;i++){
    print $1","a[i]
}' file2 file1

# ./shell.sh
OK, this is derivation of Dennis Williamson's solution above since he noted that his does not read from file:

$ echo {`cat a | tr "\012" ","`}\,\ {`cat b | tr "\012" ","`}$'\n'
a, c
 a, d
 a, e
 b, c
 b, d
 b, e
This is what that gives me: `{a,b,}, {c,d,e,}` as a literal string.
Dennis Williamson
A solution using join, awk and process substitution:

join <(xargs -I_ echo 1 _ < setA) <(xargs -I_ echo 1 _ < setB)
  | awk '{ printf("%s, %s\n", $2, $3) }'
What is the contents of the file "a"? Should one of them be a different file? The AWK could probably be replaced by `cut -f2- -d' '`.
Dennis Williamson
The "a" file contains the set. They may be different if wanted. I'll correct it!
@Dennis, `cut` is probably better, since it works even if `setB` contains lines with whitespaces.
