tags:

views:

192

answers:

3

Do most (IE, FF, Safari, Chrome, Opera) make multiple HTTP Requests for a PDF file when displaying the PDF in a browser? I am working on an issue integrating with WebTrends Web Analytics software, and the statistics around PDFs appear to be incorrect. Support told me that because WebTrends parses the Web Servers access logs to determine traffic, downloads, etc. it has a difficult time determining accurate PDF downloads because:
When a user clicks on a PDF and the PDF opens in the user's browser via the Acrobat Reader browser plug-in, each page is downloaded one-at-a-time -- it does this to conserve bandwidth, if a user only views the first 2 pages of a 50 page PDF, only the first 2 pages are downloaded.

This sounds fishy to me (how could a HTTP Request be made to only serve out a portion of a binary file?) -- I've been searching Google, but haven't found anything that speaks to this.

I will try to find some IE software that lets me sniff the HTTP traffic tomorrow to see if i can observe this phenomenon.

Any info/thoughts are appreciated though.

A: 

My thoughts are that you are spot on: your plug-in can not (and should not) split PDF's into requests.

I have a web application which serves PDF files from a request (a single request) and displays in a plug-in. It displays the entire PDF without getting any more information.

Also, if you are looking for a HTTP sniffer you could try Fiddler. I have found this useful during web site debugging.

Russell
I checked it out in HTTPWatch using IE (the company's official "supported" browser) with the latest Adobe Acrobat reader plugin and it was pulling down entire PDFs.I did not see anything in the headers about byte ranges.
empire29
A: 

See RFC 2616, Section 3.12.

Julian Reschke
A: 

If your site returns an HTTP response header like this:

Accept-Ranges: bytes

the PDF reader will close the intitial connection after reading just a few KB of the document. It then requests sections of the document as required with the Range request header, e.g.:

Range: bytes=242107-244329, 8060-76128

An example of a URL that does this is http://www.ovationguitars.com/img/OVmanual.pdf .

If you don't return the Accept-Ranges header then the PDF document will be downloaded in a single request (e.g. http://manuals.info.apple.com/en/iphone_user_guide.pdf )

You can see the behavior of the PDF reader in IE using HttpWatch.

** Disclaimer: This answer was posted by Simtec Limited, the makers of HttpWatch **

HttpWatchSupport
Very interesting thank you! So it appears this is possible, however after further investigation (watching the HTTPRequests/Respsonses) it does not appear that Adobe Acrobat reader plugin for IE supports creating requests in this fashion (and possibly nor does the Web application that is serving the PDFs, though i havent sent it any synthetic requests the byte ranges)
empire29