views:

624

answers:

3

Hello,

It happens that when I save a web-page source from IE it differs from source downloaded by HttpWebRequest in my C# app.

I have saved both files for reference. The one saved from IE is here and the one from HttpWebRequest is here.

They differ in formating and in the content itself. It seems that the one downloaded by HttpWebRequest is broken and doesn't consist of valid data (which is perfect when saved from IE).

I don't know why I cannot achieve a nice formated source using IE.

Reagrds Mariusz

+1  A: 

I suspect the one downloaded using IE has got some state associated with it from either cookies or session variables that were set when you visited the site manually. The one downloaded using C# will have the default values for everything, and hence different content.

This looks most likely because the file_web file contains a section called "LastViewedHotels" that contains an entry for the Arora Manchester.

Additionally, it looks like there is dynamic content for displaying adverts, which is different between the two files.

adrianbanks
Thanks. It may be the reason, because I have used a webbrowser control as well getting exactly this same effect.
aristo
+1  A: 

Usually this happens when the site you are navigating to, loads additional content via Ajax or frames.

To overcome this and always fetch the content IE sees, you can use the WebBrowser control to navigate and take the source from there.

Here is an Example

Am
Hi. Thank you for answer. I have tried to use WebBrowser and get exactly this same effect...
aristo
+1  A: 

Update

From running a KDiff on the sources you gave, it looks like there's 1 major line difference:

<link rel="alternate" type="text/html" hreflang="de"...

And that looks like it has an ID generated from a session (a cookie) so there's not much you can do about that without copying the IE cookie header.

Previous answer

"Under the hood", IE and HttpWebRequest both perform the same simple task, which is to send the following text request on port 80 via a a socket to the HTTP server:

GET / HTTP/1.1

(or 1.0 - and a host header too).

If you're on Windows you can try it out. Install the built in Windows telnet client (add/remove programs->windows features), or putty and then type:

GET / HTTP/1.1 (newline)
Host: yahoo.com

The source from this, IE, and the HttpWebRequest class will be exactly the same. The only difference will come if IE is passing cookies to the server, and any extra header which normally include:

  • A user agent
  • Accept */*
  • Gzip
  • A cookies or session variable (which includes session variables - cookies that expire when IE is closed)

For formatting, IE might turn tabs into spaces, or the other way around. The HttpWebRequest will return the raw results without any formatting.

Chris S
Thanks for your answer. Unfortunately I don't know to much about how to use a cookie with my request to cheat web server...
aristo