Hi all. I'm using C# + HttpWebRequest. I have an HTML page I need to frequently check for updates. Assuming I already have an older version of the HTML page (in a string for example), is there any way to download ONLY the "delta", or modified portion of the page, without downloading the entire page itself and comparing it to the older version?
Only if that functionality is included in the web server, and that's pretty unlikely. So no, sorry.
Not for any given page, no.
But if you wrote a facility to give you the differences based on a timestamp or some kind of ID, then yes. This isn't anything standard. You'd have to create a feed for the page using syndication, or create a web service to satisfy the need. Of course, you have to be in control of the web server you want to monitor, which may not be the case for you.
You have the old version and the server has the new version. How could you download just the delta without knowing what has been changed? How could the server deliver the delta without knowing which old version you have?
Obviously, there is no way around downloading the entire page. Or uploading the old version to the server (assuming the server has a service that allows that), but that would only increase the traffic.
The short answer is, no. The long answer is that if the HTML is in version control and you write some server side code that, given a particular version number, gives you the diff between the current version and the specified version, yes. If the HTML isn't in version control and you just want to compare your version to the current version, then either you need to download the current version to do the comparison on the client or upload your version to the server and have it do the comparison -- and send the difference back. Obviously, it's more efficient just to have your client re-download the new version.
Like the other answers before me, There is no way to get around the download.
You can however not parse the html if it the same by creating a hash for each page revision and comparing the current hash with the new hash. Then you would use a diff algorithm to extract only the 'delta' information. I think most modern crawlers do something along these lines.
Set IfModifiedSince
property of HttpWebRequest
.
This won't give you 'delta', but will reply with 301 if the page was not modified at all.
If the older versions were kept on the web server, and when you requested the delta, you sent a 'version number' or a modified date for the version that you have, theoretically the server could diff the page and send only the difference. But both copies have to be on one machine for anybody to know what the difference is.
You could use the AddRange method of the HttpWebRequest Class. With this you can specify a byte range of the ressource you want to download. This is also used to continue interrupted http downloads.
This is no delta but you can decrease traffic if you only load the parts that change.