tags:

views:

1158

answers:

4

I am curious to know how the Google Docs PDF viewer works? It's not a flash like scribd.com; it looks like pure HTML. Any idea how do they did it?

Sample link to view the PDF

+6  A: 

Google is simply serving up an an image (right click -> save as), with an overlay to highlight text.

You should check out this SO question where others go into more detail.

You should also look through the source of your PDF link, it would appear Google are passing the PDF link through to be converted into an image.

Example:

<script type="text/javascript"> 
        var gviewElement = document.getElementById('gview');
        var config = {

          'api': false,
          'chrome': true,
          'csi': true,
          'ddUrl': "http://www.idfcmf.com/downloads/monthly_fund/2009/IDFC-Premier-Equityfund-jan10.pdf",
          'element': gviewElement,
          'embedded': false,
          'initialQuery': "",
          'oivUrl': "http://docs.google.com/viewer?url\x3dhttp%3A%2F%2Fwww.idfcmf.com%2Fdownloads%2Fmonthly_fund%2F2009%2FIDFC-Premier-Equityfund-jan10.pdf",
          'sdm': 200,
          'userAuthenticated': true
        };

        var gviewApp = _createGView(config);
        gviewApp.setProgress(50);


          window.jstiming.load.name = 'view';

          window.jstiming.load.tick('_dt');

      </script> 

Edit

Also if you were to view the PDF viewer in Firefox with Firebug, you will notice that when you 'highlight' text it's really only enabling a load of divs, I'm guessing Google scans the document using OCR, detects where the text is and provides a matrix of coordinates on which to base the div placement on, when you click and drag it introgates the mouse pointer location to determine which divs to display.

ILMV
No.. it is not converting entire thing into image. because, it allows you to select and copy the text inside it. i dont think we can do that in image..
Jeeva S
No... it IS converting it to an image...you can tell this because I downloaded it as a PNG! How it managed to provide an overlay for highlighting / copying text is something that I can't explain, but it is converting it to an image. Have you looked at the other SO post I linked to?
ILMV
ILMV
bro.. i am not denying that its rendering some images.. but, overall, its render like a html page along with images and text. my question is how the PDF Viewer works? (complete flow of accurate info).. no guess answer..
Jeeva S
As far as I'm concerned I've answered your question as best I can without actually calling Google and getting the info directly. Could you please be specific in what you want to know? I've told you how the PDF itself is rendered as an image, I've told you how the text highlighting works? I'm confused as to what you want that I haven't explained. And apart from the text coordinates (which I'm pretty sure on) I've guessed nothing, use firebug and see for yourself.
ILMV
+1  A: 

the whole thing is an image. text highlight overlay - thats easy to figure out. but when you press ctrl+c and it copies to the clipboard, that part has me totally stumped. because it's not possible to write to the clipboard using javascript in firefox, but this ctrl+c on the image works fine in firefox. http://www.google.com/support/forum/p/Google+Docs/thread?tid=67dcf21ef8579b4c&amp;hl=en&amp;fid=67dcf21ef8579b4c00047e4a2a9fcb12

foo
I assume they are not using javascript to put the text in the clipbox, but rather selectable text in the browser. And when you hit Ctrl+C, you are actually using the normal Copy feature of the browser.
BastiBense
A: 

Be aware that Google Docs may be blocked by some corporate firewalls. It is by mine. #fail

Tim
whats the point?
Uncle
A: 

I agree with some of the other answers - the PDF is rendered as a PNG, and very likely the text areas are layered, probably using absolute/relative positioning. You can extract PDF information from the PDF (of course...). The PDF format is open - anyone could do it (granted, it might not be easy). However there are some open source tools out there (xPDF...) that enables export of PDF contents, like to XML. It's possible that the exports include information like coordinates as to where on the page text and images should display.

Jonny