views:

89

answers:

2

I'm trying to download the source HTML of a website using the WebClient.DownloadData() method.

My method is supposed to give me the source:

public string GetSite(string URL)
    {
        Uri Site = new Uri(URL);
        byte[] lol = Client.DownloadData(Site);
        SiteSource = Encoding.ASCII.GetString(lol);
        return SiteSource;                    
    }

I've TRIPLE checked and when I write the exact same url of the URL parameter I send this method, my programs downloads something else entirely.

Pressing ctrl+U in firefox to see the source code shows me what I need to see (again, simple HTML), but in my software I see something entirely different.

What gives?

FOR CLARITY:

Imagine in Firefox you write www.google.com, viewing the source in Firefox you see:

<html>
   <head>
   </head>
   <body> 
       <h1>Hello!</h1>
   </body>
</html>

But if I were to use the DownloadData method for the exact same URL, my program would download a source code like this:

<html>
   <head>
   </head>
   <body> 
       <h1>Bonjour!</h1>
   </body>
</html>
+4  A: 

The site may be doing browser detection, and serving up different HTML depending on whether it perceives the client to be Firefox, IE, a Web crawler, etc.

itowlson
Ah...is there a way for me to circumvent this via C#? I suspect this is the case. If it is, I'm fucked. :(
Sergio Tapia
Setting your User Agent to match that of Firefox's would circumvent it.
Will Eddins
Specifically, webClient.Headers.Add("User-Agent", ...). See the WebClient.Headers docs in MSDN.
itowlson
A: 

The site might use cookies that are set in Firefox, the User-Agent header or other HTTP headers to decide what content should be sent to you.

Since your C# program sends different data than Firefox the site might send different content.

sth