tags:

views:

90

answers:

4

I have 4 files sorted alphabetically, A, B, C, and D. These files contain a single string on each line. Essentially, what needs to happen is that anything in B gets deleted from A. The result of that will then be stripped of anything in C. And similarly, the result of that will be stripped of D.

Is there a way to this using Linux commands?

A: 
grep -x -v -f B A | grep -x -v -f C | grep -x -v -f D

The -v switch is an inverse match (i.e. match all except). The -f switch takes a file with a list of patterns to match. The -x switch forces it to match whole lines (so that lines that are substrings of other lines don't cause the longer lines to be removed).

Tyler McHenry
+1  A: 

Look at the join command. Read its man page and you should find what you seek.

Michael E
A: 
join A B | join - C | join - D
biznez
Doesn't that do pretty much the opposite of what you want? That would give you lines that exist in all four files. Plus, it doesn't work if any of your lines have spaces in them.
Tyler McHenry
Yea. Sorry a straight join should do it.
biznez
But still... I'm not an expert on join but from reading the man page, join A B will give you all the lines in both A and B, not the lines in A but not B, which is what you asked about. From what I can tell the join-based answer to your original question would be something like: `join -t \n -v 1 A B | join -t \n -v 1 - C | join -t \n -v 1 - D`
Tyler McHenry
+2  A: 

comm is good for this, either:

cat B C D | sort | comm -2 -3 A -

or:

comm -2 -3 A B | comm -2 -3 - C | comm -2 -3 - D

depending on what's easier/clearer for your script.

caf
I'd say this is easily the simplest of the answers that have been given so far.
Tyler McHenry