views:

216

answers:

4

Is there an 'out-of-the-box' way in python to generate a list of differences between two texts, and then applying this diff to one file to obtain the other, later?

I want to keep the revision history of a text, but I don't want to save the entire text for each revision if there is just a single edited line. I looked at difflib, but I couldn't see how to generate a list of just the edited lines that can still be used to modify one text to obtain the other.

+2  A: 

Does it have to be a python solution?
My first thoughts as to a solution would be to use either a Version Control System (Subversion, Git, etc.) or the diff / patch utilities that are standard with a unix system, or are part of cygwin for a windows based system.

Simon Callan
It would have to be a pure python solution because I'd like to deploy it in AppEngine. `diff`/`patch` would be ideal, but then in python.
Noio
A: 

AFAIK most diff algorithms use a simple Longest Common Subsequence match, to find the common part between two texts and whatever is left is considered the difference. It shouldn't be too difficult to code up your own dynamic programming algorithm to accomplish that in python, the wikipedia page above provides the algorithm too.

jai
+4  A: 

Did you have a look at diff-match-patch from google? Apparantly google Docs uses this set of algoritms. It includes not only a diff module, but also a patch module, so you can generate the newest file from older files and diffs.

A python version is included.

http://code.google.com/p/google-diff-match-patch/

Jasper
Exactly what I was looking for! I tried googling for different combinations of "python","diff","patch","revision", but hadn't found this yet.
Noio
A: 

Does difflib.unified_diff do want you want? There is an example here.

pwdyson