C# read HTML/PHP files code

tags:

c#
html
html-parsing
source-code

views:

answers:

+1 Q:

C# read HTML/PHP files code

I want to write an application using the C# that takes a URL as a parameter/input and then get the source code of the page, extract some URLs and some text based on given criteria ...

+1 A:

One approach would be to use WebClient or WebRequest to fetch the page, tidyfornet to convert it into XML, and then LINQ to XML to extract the data. Actual HTML is very lax, but tidyfornet (which is a wrapper around HTML tidy) will do appropriate cleaning up and conversion to XML.

Jon Skeet 2010-08-15 11:43:01

You should use a combination of WebRequest and regulat expressions. Or a dom parser if you have advanced needs:

WebRequest tutorial:

Network/0380_WebRequest.htm">http://www.java2s.com/Tutorial/CSharp/0580_Network/0380_WebRequest.htm

Regular Expressions tutorial:

http://www.codeproject.com/KB/dotnet/regextutorial.aspx

C# DOM parser:

http://stackoverflow.com/questions/100358/looking-for-c-html-parser

Pierre 303 2010-08-15 11:47:47

ansaurus

tags:

views:

answers:

C# read HTML/PHP files code

related questions