views:

19

answers:

0

Is there an elegant way to get the computed style for each DOM node in a web page, for a large number of files, in order to compare style data for similar nodes across those files?

I'm working on a large number of HTML files (> 500) containing pretty broken HTML from MS FrontPage, trying to extract style data and convert it to semantic markup. I managed to do this using regex up to a certain point, but now it's become too complex. I learned that it's a bad idea to parse HTML using regex in the first place, so I'm trying to find a way to have the browser parse the HTML and give me the computed style for each node on the page.

I know I can access the DOM and get the computed style for each node using JavaScript, but I can only do this for one file at a time, and there is no easy way to compare this data across several files, or is there? If I'm not mistaken, it's not possible to dump data from JavaScript to a file. What alternatives would there be?

(BTW. I've tried to use HTMLTidy, but the HTML is so borked that it crashes.)