views:

2787

answers:

14

I'm interested in seeing a good diff algorithm, possibly in Javascript, for rendering a side-by-side diff of two HTML pages. The idea would be that the diff would show the differences of the rendered HTML.

To clarify, I want to be able to see the side-by-side diffs as rendered output. So if I delete a paragraph, the side by side view would know to space things correctly.

+1  A: 

I believe a good way to do this is to render the HTML to an image and then use some diff tool that can compare images to spot the differences.

Any misalignment in these two images will of course produce massive differences, where the actual difference is just miniscule, like a table being one pixel higher in one of the two pages.
Lasse V. Karlsen
+2  A: 

So, you expect

<font face="Arial">Hi Mom</font>

and

<span style="font-family:Arial;">Hi Mom</span>

to be considered the same?

The output depends very much on the User Agent. Like Ionut Anghelcovici suggests, make an image. Do one for every browser you care about.

Josh
A: 

Okay, so you want

<body style="background-color:white">
<p>Hi Mom
<p>Hi Dad
</body>

diffed against

<body style="background-color:white">
<p>Hi Dad
</body>

to generate this HTML:

<body style="background-color:white">
<p style="color:white">Hi Mom
<p>Hi Dad
</body>

Right?

Josh
A: 

@Josh exactly. Though maybe it would show the deleted text in red or something. The idea is that if I use a WYSIWYG editor for my HTML content, I don't want to have to switch to HTML to do diffs. I want to do it with two WYSIWYG editors side by side maybe. Or at least display diffs side-by-side in an end-user friendly matter.

Haacked
+8  A: 

There's another nice trick you can use to significantly improve the look of a rendered HTML diff. Although this doesn't fully solve the initial problem, it will make a significant difference in the appearance of your rendered HTML diffs.

Side-by-side rendered HTML will make it very difficult for your diff to line up vertically. Vertical alignment is crucial for comparing side-by-side diffs. In order to improve the vertical alignment of a side-by-side diff, you can insert invisible HTML elements in each version of the diff at "checkpoints" where the diff should be vertically aligned. Then you can use a bit of client-side javascript to add vertical spacing around checkpoint until the sides line up vertically.

Explained in a little more detail:

If you want to use this technique, run your diff algorithm and insert a bunch of visibility:hidden s or tiny s wherever your side-by-side versions should match up, according to the diff. Then run javascript that finds each checkpoint (and its side-by-side neighbor) and adds vertical spacing to the checkpoint that is higher-up (shallower) on the page. Now your rendered HTML diff will be vertically aligned up to that checkpoint, and you can continue repairing vertical alignment down the rest of your side-by-side page.

kamens
A: 

If it is XHTML (which assumes a lot on my part) would the Xml Diff Patch Toolkit help? http://msdn.microsoft.com/en-us/library/aa302294.aspx

MotoWilliams
A: 

For smaller differences you might be able to do a normal text-diff, and then analyse the missing or inserted pieces to see how to resolve it, but for any larger differences you're going to have a very tough time doing this.

For instance, how would you detect, and show, that a left-aligned image (floating left of a paragraph of text) has suddenly become right-aligned?

Lasse V. Karlsen
+2  A: 

I ended up needing something similar awhile back. To get the HTML to line up side to side, you could use two iFrames, but you'd then have to tie their scrolling together via javascript as you scroll (if you allow scrolling).

To see the diff, however, you will more than likely want to use someone else's library. I used DaisyDiff, a Java library, for a similar project where my client was happy with seeing a single HTML rendering of the content with MS Word "track changes"-like markup.

HTH

kooshmoose
A: 

Using a text differ will break on non-trivial documents. Depending on what you think is intuitive, XML differs will probably generate diffs that aren't very good for text with markup. AFAIK, DaisyDiff is the only library specialized in HTML. It works great for a subset of HTML.

A: 

If you were working with Java and XHTML, XMLUnit allows you to compare two XML documents via the org.custommonkey.xmlunit.DetailedDiff class:

Compares and describes all the differences between two XML documents. The document comparison does not stop once the first unrecoverable difference is found, unlike the Diff class.

Ates Goral
+2  A: 

Consider using the output of links or lynx to render a text-only version of the html, and then diff that.

Arafangion
+1  A: 

Use the markup mode of Pretty Diff for HTML. It is written entirely in JavaScript.

http://mailmarkup.org/prettydiff/prettydiff.html

A: 

What about DaisyDiff (Java and PHP vesions available).

Following features are really nice:

  • Works with badly formed HTML that can be found "in the wild".
  • The diffing is more specialized in HTML than XML tree differs. Changing part of a text node will not cause the entire node to be changed.
  • In addition to the default visual diff, HTML source can be diffed coherently.
  • Provides easy to understand descriptions of the changes.
  • The default GUI allows easy browsing of the modifications through keyboard shortcuts and links.
elhoim
+2  A: 

Over the weekend I posted a new project on codeplex that implements an HTML diff algorithm in C#. The original algorithm was written in Ruby. I understand you were looking for a JavaScript implementation, perhaps having one available in C# with source code could assist you to port the algorithm. Here is the link if you are interested: htmldiff.codeplex.com. You can read more about it here.

Rohland