ansaurus

Question

Answer 1

+11 A:

A partial answer would be to use pdftotext and compare the text contained.

Sklivvz 2008-09-28 11:05:17

But this will ignore all non text informations like lines, boxes, pictures, charts, etc. I think also that it not show the optical positions of text else the structural position.

Horcrux7 2008-09-28 11:30:21

I agree, it is not a sufficient criteria. On the other hand it is a necessary criteria, therefore it is adequate as a unit test.

Sklivvz 2008-09-28 11:35:33

You can always add a better unit test later!

Sklivvz 2008-09-28 11:36:03

If there are images on pages, and you want a human-like evaluation for those, there's not much you can do but have a human compare those pages, unless you want to work on a whole new project, just as big as your current one, to try it out.

Chris Charabaruk 2008-09-28 11:52:11

Answer 2

+1 A:

Never actually been in your situation before, but I've tried ExamDiff Pro to compare PDFs and it worked for me.

cubex 2008-09-28 11:35:47

Answer 3

A:

I think your best approach would be to convert the PDF to images at a decent resolution and than do an image compare.

To generate images from PDF you can use Adobe PDF Library or the solution suggested at http://stackoverflow.com/questions/75500/best-way-to-convert-pdf-files-to-tiff-files.

To compare the generated TIFF files I found GNU tiffcmp (for windows part of GnuWin32 tiff) and tiffinfo did a good job. Use tiffcmp -l and count the number of lines of output to find any differences. If you are happy to have a small amount of content change (e.g. anti-aliasing differences) then use tiffinfo to count the total number of pixels and you can then generate a percentage difference value.

By the way for anyone doing simple PDF comparison where the structure hasn't changed it is possible to use command line diff and ignore certain patterns, e.g. with GNU diff 2.7:

diff --brief -I xap: -I xapMM: -I /CreationDate -I /BaseFont -I /ID --binary --text

This still has the problem that it doesn't always catch changes in generated font names.

danio 2008-09-29 15:04:11

I think the comparing of 2 images is more complex then comparing the PDF files self.

Horcrux7 2010-02-16 08:37:54

Comparing images can be done with GnuWin32 tiffcmp. I will update my answer to elaborate on this.

danio 2010-02-16 09:07:28

Answer 4

+1 A:

I think Bitmap check should work in your case. I use a automation tool to compare 2 images using bitmap check point

Chanakya 2008-09-29 17:57:06

Answer 5

+4 A:

I've used a home-baked script which

converts all pages on two PDFs to bitmaps
colors pages of PDF 1 to red-on-white
changes white to transparent on pages of PDF 2
overlays each page from PDF 2 on top of the corresponding page from PDF 1
runs conversion/coloring and overlaying in parallel on multiple cores

Software used:

GhostScript for PDF-to-bitmap conversion
ImageMagick for coloring, transparency and overlay
inotify for synchronizing parallel processes
any PNG-capable image viewer for reviewing the result

Pros:

simple implementation
all tools used are open source
great for finding small differences in layout

Cons:

the conversion is slow
major differences between PDFs (e.g. pagination) result in a mess
bitmaps are not zoomable
only works well for black-and-white text and diagrams
no easy-to-use GUI

I've been looking for a tool which would do the same on PDF/PostScript level.

Here's how our script invokes the utilities (note that ImageMagick uses GhostScript behind the scenes to do the PDF->PNG conversion):

$ convert -density 150x150 -fill red -opaque black +antialias 1.pdf back%02d.png
$ convert -density 150x150 -transparent white +antialias 2.pdf front%02d.png
$ composite front01.png back01.png result01.png # do this for all pairs of images

akaihola 2010-02-10 08:59:38

Answer 6

+1 A:

We've also used pdftotext (see Sklivvz's answer) to generate ASCII versions of PDFs and wdiff to compare them.

Use pdftotext's -layout switch to enhance readability and get some idea of changes in the layout.

To get nice colored output from wdiff, use this wrapper script:

#!/bin/sh
RED=$'\e'"[1;31m"
GREEN=$'\e'"[1;32m"
RESET=$'\e'"[0m"
wdiff -w$RED -x$RESET -y$GREEN -z$RESET -n $1 $2

akaihola 2010-02-10 09:08:33

Answer 7

+2 A:

Because there is no such tool available that we have written one. You can download the i-net PDF content comparer and use it. I hope that help other with the same problem. If you have problems with it or you have feedback for us then you can contact our support.

Horcrux7 2010-02-16 08:34:47

The advantage of this tool is, that it's neither a pure text comparer nor an image comparer. It compares by structure, checks if the containing elements are "the same" - so your compared PDFs do not have to match 100% but be within a definable similarity.And it's for free.

gamma 2010-10-14 05:22:47

Answer 8

A:

blubeam pdf software will do this for you

M Jenkins 2010-03-23 13:55:31

Answer 9

A:

Tarkware Pdf Comparer may suite your needs. But it's not free and requires Adobe Acrobat.

erks 2010-03-28 21:13:48

Answer 10

A:

Our product, PDF Comparator - http://www.premediasystems.com/pdfc.html" - will do this quite elegantly and efficiently. It's also not free, and is a Mac OS X only application.

Peter Truskier 2010-08-03 00:09:37

This tool compare pixel by pixel. This is very simple. The question was a compare like a human people do it.

Horcrux7 2010-08-05 09:07:27

ansaurus

tags:

views:

answers:

How to compare two PDF files?

related questions