tags:

views:

93

answers:

4

I am looking for a library in any language--preferably PHP though--that will display the difference between two web pages. The differences can be displayed side-by-side, all in one document, or in any other creative way.

Examples of what this would look like:

I am NOT looking for raw code diffing, like this: http://thinkingphp.org/img/code_coverage_html_diff_view.png. I do NOT want to show the difference between two sets of HTML. I want to show differences in rendered, WYSIWYG form.

Every solution I tried suffered from one or more of the following problems:

  • If I change the attribute of an element (eg. change [table border="1"] to [table border="2"]), then I'll have an extra table tag in the output (eg. [table border="1"][table border="1"][tr][td]...). And, one table tag will have a del tag around it, while the other will have an ins tag around it, and that will obviously cause problems.
  • If I change [html][body][b]some content here[/b][/body][/html] to [html][body][i]some other content here[/i][/body][/html] then it looks like [html][body][b][del]original[/del][i][ins]new[/ins] content here[/b][/i][/body][/html]

I'm looking for out-of-the-box ideas. Any ideas are welcome.

+1  A: 

The relation between html code and rendered page is not sufficiently defined for this to work in the generic case. You need to be more specific for the problem to be solvable.

  • css changes influence this
  • do you want to handle invalid html
  • it is easier to solve for a specific browser (version)
Stephan Eggermont
I don't want to worry about invalid HTML. If we have to select a browser, let's go wtih Firefox--but I'd prefer things to be done on the server-side, unless there is some fancy Javascript library/approach.
Chad Johnson
+1  A: 

One of the setups you liked is a screenshot from a Wikipedia page. If that is the sort of diff'ing mechanism you are looking for, and it needs to be in PHP, then why not download MediaWiki and look at the portion of their code responsible for generating the diff?

It's probably the closest thing you are going to find to a generic, no-setup-needed, out-of-the box solution. (At least it's the closest thing I know of).

Sean Vieira
I actually downloaded and tried the libraries used to produce those screenshots, and they suffer from one or more of the problems I described. As for using the MediaWiki engine, MediaWiki only displays a diff between raw wiki text, not rendered WYSIWYG HTML.
Chad Johnson
@Chad, so if I understand you correctly, you have raw html code **now** and you want to be able to display the difference between two versions of it *as it is rendered in the browser*.
Sean Vieira
@Sean That's correct.
Chad Johnson
+2  A: 

Here's a list of visual HTML diff products, discussed and compared.

luvieere
+1  A: 

Daisy Diff is a great diff program developed in java that does a very decent job at comparing HTML code.

It even has an option to go left and right to revise the changes that have been made. Daisy Diff is so powerful that it can handle attribute changes inside tags, it will tell you if an image has been changed, or if a link was removed or updated.

It is an open source project can be downloaded form google : http://code.google.com/p/daisydiff/

I know it is not php, but it may be your best chance at having a decent html diff. Use system or shell_exec php methods to execute a line like this:


java -jar daisydiff.jar http://myPageOld.html    html://myPageNew.html   
--file=result.html 
--output=html 
--type=html

the result of your diff will go to the file result.html I recommend you use it!

Onema
I think this is exactly what I was looking for! Great find. Thanks.
Chad Johnson