How about:
cat file1 file2
| awk '{print $1" "$2" "$3}'
| sort
| uniq -c
| grep -v '^ *1 '
| awk '{print $2" "$3" "$4}'
This is assuming you're not too worried about the white space between fields (in other words, three tabs and a space is no different to a space and 7 tabs). This is usually the case when you're talking about fields within a text file.
What it does is output both files, stripping off the last field (since you don't care about that one in terms of comparisons). It the sorts that so that similar lines are adjacent then uniquifies them (replaces each group of adjacent identical lines with one copy and a count).
It then gets rid of all those that had a one-count (no duplicates) and prints out each with the count stripped off. That gives you your "keys" to the duplicate lines and you can then use another awk iteration to locate those keys in the files if you wish.
This won't work as expected if two identical keys are only in one file since the files are combined early on. In other words, if you have duplicate keys in file1
but not in file2
, that will be a false positive.
Then, the only real solution I can think of is a solution which checks file2
for each line in file1
although I'm sure others may come up with cleverer solutions.
And, for those who enjoy a little bit of sado-masochism, here's the afore-mentioned not-overly-efficient solution:
cat file1
| sed
-e 's/ [^ ]*$/ "/'
-e 's/ / */g'
-e 's/^/grep "^/'
-e 's/$/ file2 | awk "{print \\$1\\" \\"\\$2\\" \\"\\$3}"/'
>xx99
bash xx99
rm xx99
This one constructs a separate script file to do the work. For each line in file1
, it creates a line in the script to look for that in file2
. If you want to see how it works, just have a look at xx99
before you delete it.
And, in this one, the spaces do matter so don't be surprised if it doesn't work for lines where spaces are different between file1
and file2
(though, as with most "hideous" scripts, that can be fixed with just another link in the pipeline). It's more here as an example of the ghastly things you can create for quick'n'dirty jobs.
This is not what I would do for production-quality code but it's fine for a once-off, provided you destroy all evidence of it before The Daily WTF finds out about it :-)