views:

75

answers:

1

Hi,

I'm wondering if anyone knows of a good library for Java to use to measure HTML equivalence?

For example <td class="one two three" name="goat"> would be equivalent to <td name="goat" class="three two one">. I would like to compare entire many-lined strings of html in this manner using Java.

Any suggestions?

UPDATE:

so I tried the use of XmlUnit's Diff.similar() and found that I was getting that these two were similar:

<html three="3" two="2" one="1"></html> and <html one="one" two="two"></html>

This is undesired behavior... Are there any other options?

+2  A: 

You could use a html parser like NekoHTML or JTidy, and then use the Diff class of XMLUnit to compare the two XML documents.

Valentin Rocher
Diff().similar() sounds like what I'm looking for. Thanks.
Alex Baranosky
so I tried the use of .similar() and found that I was getting that these two were similar:<html three="c" two="b" one="a"></html> and <html one="a" two="b" ></html> which is not the desired behavior...
Alex Baranosky
I just tried to compare the two documents you specified using Diff.similar and it returned false...How did you do it ?
Valentin Rocher