views:

109

answers:

3

Hi,

I am looking for an http library (for c# program) that will allow me to download some html documents from the web. i am aware to the HttpWebRequest object and the other options that provided by the .NET library, however, i need more complete solution that would be able to handle different document encodings (sometimes the encoding is specified in the document itself rather than in the Http headers).

Thanks.

+2  A: 

The WCF Rest Starter Kit contains a HttpClient class which is quite helpful - it is available today for .NET 3.5 SP1 and can be used right away. Since it seems to be considered a useful class, it might end up showing up in future release of .NET 4.0 in the base class library.

Definitely also check out a tutorial screencast by Aaron Skonnard featuring the HttpClient and other goodies from the WCF rest starter kit, and other WCF rest starter kit resources:

http://msdn.microsoft.com/en-us/netframework/cc950529.aspx

Marc

marc_s
Downvoted for .NET 4.0 not being even RTM yet, so the asker probably can't use it in production environment for some months.
Tamás Szelei
I was not aware of that. Downvote undone.
Tamás Szelei
Thanks, sztomi!
marc_s
+1  A: 

The Webclient class provides everything you need. To handle the special encoding cases, download the document as byte stream, and then do what's necessary.

Tamás Szelei
A: 

sztomi is right, the Webclient class can probably do what you need.

If you need to parse and work with the HTML, check out the HTML Agility Pack (http://www.codeplex.com/htmlagilitypack)

"This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams)."

Shane Cusson