views:

9263

answers:

6

I am trying to use WebClient to download a file from web using a Winform application. However, I really only wanted to download HTML file. Any other type I will want to ignore. I checked the WebResponse.contenttype, it always return me NULL.

Anyone have any idea what could be the cause?

A: 

You could issue the first request with the HEAD verb, and check the content-type response header? [edit] It looks like you'll have to use HttpWebRequest for this, though.

Marc Gravell
(obsoleted by OP's follow-up - see my other reply about GetWebRequest)
Marc Gravell
+1  A: 

WebResponse is an abstract class and the ContentType property is defined in inheriting classes. For instance in the HttpWebRequest object this method is overloaded to provide the content-type header. I'm not sure what instance of WebResponse the WebClient is using. If you ONLY want HTML files, your best of using the HttpWebRequest object directly.

Micah
A: 

Your question is a bit confusing: if you're using an instance of the Net.WebClient class, the Net.WebResponse doesn't enter into the equation (apart from the fact that it's indeed an abstract class, and you'd be using a concrete implementation such as HttpWebResponse, as pointed out in another response).

Anyway, when using WebClient, you can achieve what you want by doing something like this:

Dim wc As New Net.WebClient()
Dim LocalFile As String = IO.Path.Combine(Environment.GetEnvironmentVariable("TEMP"), Guid.NewGuid.ToString)
wc.DownloadFile("http://example.com/somefile", LocalFile)
If Not wc.ResponseHeaders("Content-Type") Is Nothing AndAlso wc.ResponseHeaders("Content-Type") <> "text/html" Then
    IO.File.Delete(LocalFile)
Else
    '//Process the file
End If

Note that you do have to check for the existence of the Content-Type header, as the server is not guaranteed to return it (although most modern HTTP servers will always include it). If no Content-Type header is present, you can fall back to another HTML detection method, for example opening the file, reading the first 1K characters or so into a string, and seeing if that contains the substring <html>

Also note that this is a bit wasteful, as you'll always transfer the full file, prior to deciding whether you want it or not. To work around that, switching to the Net.HttpWebRequest/Response classes might help, but whether the extra code is worth it depends on your application...

mdb
A: 

I apologize for not been very clear. I wrote a wrapper class that extends WebClient. In this wrapper class, I added cookie container and exposed the timeout property for the WebRequest.

I was using DownloadDataAsync() from this wrapper class and I wasn't able to retrieve content-type from WebResponse of this wrapper class. My main intention is to intercept the response and determine if its of text/html nature. If it isn't, I will abort this request.

I managed to obtain the content-type after overriding WebClient.GetWebResponse(WebRequest, IAsyncResult) method.

The following is a sample of my wrapper class:

public class MyWebClient : WebClient
{
    private CookieContainer _cookieContainer;
    private string _userAgent;
    private int _timeout;
    private WebReponse _response;

    public MyWebClient()
    {
        this._cookieContainer = new CookieContainer();
        this.SetTimeout(60 * 1000);
    }

    public MyWebClient SetTimeout(int timeout)
    {
        this.Timeout = timeout;
        return this;
    }

    public WebResponse Response
    {
        get { return this._response; }
    }

    protected override WebRequest GetWebRequest(Uri address)
    {
        WebRequest request = base.GetWebRequest(address);

        if (request.GetType() == typeof(HttpWebRequest))
        {
            ((HttpWebRequest)request).CookieContainer = this._cookieContainer;
            ((HttpWebRequest)request).UserAgent = this._userAgent;
            ((HttpWebRequest)request).Timeout = this._timeout;
        }

        this._request = request;
        return request;
    }

    protected override WebResponse GetWebResponse(WebRequest request)
    {
        this._response = base.GetWebResponse(request);
        return this._response;
    }

    protected override WebResponse GetWebResponse(WebRequest request, IAsyncResult result)
    {
        this._response = base.GetWebResponse(request, result);
        return this._response;
    }

    public MyWebClient ServerCertValidation(bool validate)
    {
        if (!validate) ServicePointManager.ServerCertificateValidationCallback += delegate(object sender, X509Certificate certificate, X509Chain chain, SslPolicyErrors sslPolicyErrors) { return true; };
        return this;
    }
}
In that case, change .Method - see my other reply.
Marc Gravell
+4  A: 

Given your update, you can do this by changing the .Method in GetWebRequest:

using System;
using System.Net;
static class Program
{
    static void Main()
    {
        using (MyClient client = new MyClient())
        {
            client.HeadOnly = true;
            string uri = "http://www.google.com";
            byte[] body = client.DownloadData(uri); // note should be 0-length
            string type = client.ResponseHeaders["content-type"];
            client.HeadOnly = false;
            // check 'tis not binary... we'll use text/, but could
            // check for text/html
            if (type.StartsWith(@"text/"))
            {
                string text = client.DownloadString(uri);
                Console.WriteLine(text);
            }
        }
    }

}

class MyClient : WebClient
{
    public bool HeadOnly { get; set; }
    protected override WebRequest GetWebRequest(Uri address)
    {
        WebRequest req = base.GetWebRequest(address);
        if (HeadOnly && req.Method == "GET")
        {
            req.Method = "HEAD";
        }
        return req;
    }
}

Alternatively, you can check the header when overriding GetWebRespons(), perhaps throwing an exception if it isn't what you wanted:

protected override WebResponse GetWebResponse(WebRequest request)
{
    WebResponse resp = base.GetWebResponse(request);
    string type = resp.Headers["content-type"];
    // do something with type
    return resp;
}
Marc Gravell
Don't forget XHTML: http://www.w3.org/TR/xhtml-media-types/#application-xhtml-xml
bzlm
A: 

hi i used: str webclient.downloaddata("http://localhost/");//str is string type

i want storing data without (html,php,asp,aspx) tags,what should i do?