views:

52

answers:

2

I want to write an application using the C# that takes a URL as a parameter/input and then get the source code of the page, extract some URLs and some text based on given criteria ...

+1  A: 

One approach would be to use WebClient or WebRequest to fetch the page, tidyfornet to convert it into XML, and then LINQ to XML to extract the data. Actual HTML is very lax, but tidyfornet (which is a wrapper around HTML tidy) will do appropriate cleaning up and conversion to XML.

Jon Skeet
A: 

You should use a combination of WebRequest and regulat expressions. Or a dom parser if you have advanced needs:

WebRequest tutorial:

Network/0380_WebRequest.htm">http://www.java2s.com/Tutorial/CSharp/0580_Network/0380_WebRequest.htm

Regular Expressions tutorial:

http://www.codeproject.com/KB/dotnet/regextutorial.aspx

C# DOM parser:

http://stackoverflow.com/questions/100358/looking-for-c-html-parser

Pierre 303