I want to write an application using the C# that takes a URL as a parameter/input and then get the source code of the page, extract some URLs and some text based on given criteria ...
+1
A:
One approach would be to use WebClient
or WebRequest
to fetch the page, tidyfornet to convert it into XML, and then LINQ to XML to extract the data. Actual HTML is very lax, but tidyfornet (which is a wrapper around HTML tidy) will do appropriate cleaning up and conversion to XML.
Jon Skeet
2010-08-15 11:43:01
A:
You should use a combination of WebRequest and regulat expressions. Or a dom parser if you have advanced needs:
WebRequest tutorial:
Network/0380_WebRequest.htm">http://www.java2s.com/Tutorial/CSharp/0580_Network/0380_WebRequest.htm
Regular Expressions tutorial:
http://www.codeproject.com/KB/dotnet/regextutorial.aspx
C# DOM parser:
http://stackoverflow.com/questions/100358/looking-for-c-html-parser
Pierre 303
2010-08-15 11:47:47