views:

211

answers:

5

I trying to write a programm for file compare. For example:

file1

1
2
3
4
5

file2

1
2
@
3
4
5

If I do it line by line, I get:

1 == 1; 
2 == 2;
3 != @;
4 != 3;
5 != 4;
  != 5;

But, the truth is that the only difference between the files is @. I want get something like this:

1 == 1;
2 == 2;
  != @;
3 == 3;
4 == 4;
5 == 5;

Which is the best way to do it? without using any external application, such as diff, fc, etc.

+5  A: 

Maybe this helps: http://en.wikipedia.org/wiki/Diff#Algorithm

Konamiman
+2  A: 

I wonder if Levenshtein Distance would help you in this situation. It would give you how similar the two files are but I don't know if you could zero in on the @. Something to look at none the less.

Sam152
+1  A: 

I believe what you're looking for is the distance between 2 strings, maybe this can help you.

Soufiane Hassou
+1  A: 

Python has a very handy library for comparing sequences called difflib. The underlying SequenceMatcher class takes two python sequences and gives you (among other things) a sequence of opcodes telling you how you would get from the first sequence to the second (i.e. the differences). These are of the form:

  • Replace this block with that one
  • Insert a block
  • Delete a block
  • Copy a block (called 'equal')

These reference blocks by giving indices into the original sequences. This can be applied to lines in a file or characters in a string or anything else you can turn into a sequence in python.

Ned
A: 

If you are not writing the program to learn something about diff algorithms but are simply looking for a solution, you should try diff-match-patch. It contains implementations of diff and patch algorithms in different programming languages (cpp, c#, java, javascript, python).

I tried its java version and it worked like a charm.

tangens