Does anybody know of a open source Java library that will do robust diffing the text parts of pdf files?
Ideally I would like something that would produce a diff in the for of a patch.
Does anybody know of a open source Java library that will do robust diffing the text parts of pdf files?
Ideally I would like something that would produce a diff in the for of a patch.
Extract the pdf text with http://incubator.apache.org/pdfbox/ and create a diff with http://code.google.com/p/google-diff-match-patch.