I'm looking for some sort of tool that can take an html document and pump out a selector based representation of the file.
For example:
<div>
Some text
<ul class="foo">
<li>First</li>
<li>Second</li>
<ul>
</div>
And output a flat text file in the spirit of:
div
div #text Some text
div ul.foo li Frist
div ul.foo li Second
The purpose of doing this would be to make a predicate function of some sort that would be able to compare two HTML pages and tell to what degree they match and explicitly be able to tell in isolation how much of the content or layout is different.
(For the curious, this is for the QA phase of a relatively large data migration project)