tags:

views:

58

answers:

5

I have two lists of files which I want to diff. The second list has more files in it, and because they are all in alphabetical order when I diff these two lists I get files (lines) that exists in both lists, but in a different place.

I want to diff these two lists, ignoring line place in the list. This way I would get only the new or missing lines in the list.

Thank you.

A: 

If the lines are sorted, diff should catch the insertions and deletions just fine and only report the differences.

Sparr
Let me explainI have two lists. The first one:a.txtb.txtc.txtThe second:a.txta1.txtb.txtb2.txtThe wanted diff would be:a1.txtb2.txtc.txtHow can I sort the lists if they are already sorted alphabeticallly?
Nir
A: 

Sorting the two list before you diff them will provide a more useful diff data.

pyfunc
The lists are already sorted in alphabetical order.And that is the base of the problem.
Nir
A: 

For the example you quotes @Sparr

a contains

a.txt
b.txt
c.txt

b contains

a.txt
a1.txt
b.txt
b2.txt

diff a b gives

1a2
> a1.txt
3c4
< c.txt
---
> b2.txt

What is it about this output that does not meet your needs?

Beano
I Have two lists. First:abc1Second:1abcThese lists should be identical but they are not according to diff.
Nir
But your question says that they are both in alphabetical order, the example you have just given they are not - which is it? If you need them in alphabetical order, then sort them using `sort`.
Beano
You are correct. Once I ran the sort command on both of the files it worked as I expected.Thanks.
Nir
A: 

You can try this approach which involves "subtracting" the two lists as follows:

$ cat file1
a.txt
b.txt
c.txt

$ cat file2
a.txt
a1.txt
b.txt
b2.txt

1) print everything in file2 that is not in file1 i.e. file2 - file1

$ grep -vxFf file1 file2
a1.txt
b2.txt

2) print everything in file1 that is not in file2 i.e. file1 - file2

$ grep -vxFf file2 file1
c.txt

(You can then do what you want with these diffs e.g. write to file, sort etc)

grep options descriptions:

  -v, --invert-match        select non-matching lines
  -x, --line-regexp         force PATTERN to match only whole lines
  -F, --fixed-strings       PATTERN is a set of newline-separated strings
  -f, --file=FILE           obtain PATTERN from FILE
dogbane
This works, but using cat and grep takes a very long time.I have many files and it can take up to an hour.
Nir
You don't need to use `cat`, just `grep`. The `cat` was only to illustrate the contents of the files.
dogbane
This would not work in the case where some files were sub-strings of the other. Also as the filenames will be treated as regular expressions the `.` character will be treated as any character - so `a1.txt` would match `a1ttxt`.
Beano
added the -F flag to treat them as fixed-strings instead of regular expressions.
dogbane
A: 

Do the following:

cat file1 file2 | sort | uniq -u

This will give you a list of files which are uniq (ie, not duplicated).

Explanation:
1) cat file1 file2 will put all of the entries into one list
2) sort will sort the combined list
3) uniq -u will only output the entries which don't have duplicates

No One in Particular