ansaurus

Question

Trouble Scraping Web Page With Malformed Content

Answer 1

+2 A:

Run the content through HTML Tidy before parsing it.

http://tidy.sourceforge.net/

Joshua Drake 2009-12-15 16:13:21

Thank you for response so far. Do you know if there is an equivalent .Net library? I'd like to have the application download an HTML page (not just the one I cite in my question), run Html Tidy or an equivalent, and then process.

Joe 2009-12-15 17:32:03

I'm not aware of a native one, but COM Interop should not be too difficult as long as speed is not a major issue.http://www.devx.com/dotnet/Article/20505/0/page/2 is one link.

Joshua Drake 2009-12-17 19:38:43

I have found one, but I know almost nothing about it:http://sourceforge.net/projects/tidynet/

Joshua Drake 2009-12-17 19:46:27

ansaurus

tags:

views:

answers:

Trouble Scraping Web Page With Malformed Content

related questions