tags:

views:

2382

answers:

11

Does anyone think it is possible to build a Google Docs style PDF document viewer, which will convert a document to a format that doesn't require Adobe Reader on the client machine?

If so, any references to point to? Either a place that had done it, or an explanation of how to do it.

A: 

Try converting them from PDF to TIFF. Tiff supports multiple pages and is widely supported.

If formatting isn't that important, and your PDFs are structured right (ie actually contain text, not images of text), an alternate could be to convert to HTML. The tools from Aspose are pretty good.

Robert Wagner
+1  A: 

If I understand you correctly you only want to view these files and not edit them.

Google already makes a best effort at providing PDF files found in it's search results as HTML. This doesn't always work. You can try it out by setting up a gmail account, mailing all your PDF files to it, and then using all the "View attachment as HTML" links in the messages.

Your other options are to take the source material and make it into HTML as say LaTeX2HTML does for LaTeX documents, or to convert the PDF into one of: a raster image (tiff, DjVu, etc), or a vector image (PostScript, SVG, SWF).

If the input to this process starts with the PDF files, you have very limited options, especially if the contents of the PDFs are just raster images (say scanned pages).

Personally I'd advocate for creating the PDFs from their source and trying to use Flash Paper to create an SWF out of them too as Flash Paper will pretend to be a printer. Because some 98% of browsers have Flash 9 or greater.

Have you seen Scribd?

dlamblin
A: 

I'm wondering why you would want to do that. PDF is such a general and widely supported format that if you try to avoid it you're limited to:

  • A more obscure or less well supported format (dvi, svg until it gets better support)
  • Converting to text/HTML like Google does with less than perfect results
  • Converting to an image format like TIFF which bumps up file sizes and removes all the niceties of PDF like real, selectable text and hyperlinks

If you don't want your users to have to install Adobe Reader (understandable), there are many free lightweight PDF viewers available (Foxit Reader for example), I'm sure many of these have browser embedding capabilities.

Mark Pim
+1  A: 

Am I missing something here? Google Docs DOES support PDF. Simply upload the PDF file.

+4  A: 

I've done a lot of research regarding this matter and I hope I can help.

Good old Macromedia used to market Flash Paper, which was supposed to be a PDF Adobe Reader killer as it allowed any webmaster to embed and display PDF docs online using Flash. But that was before they sold out to Adobe and Flash Paper was soon put on a shelf and forgotten in favor of Adobe's priorities.

However, Today there are a so many ground-breaking alternatives...

As a user has mentioned above you can use Scribd.com (the wanna-be YouTube for documents). But they're not the only service (and certainly not the ones most ahead of the curve).

Here are my two favorites:

  1. Issuu (http://www.issuu.com) or simply issue.com
  2. Mygazines (http://www.mygazines.com/)

I enjoy Mygazines's flash user interface the most (it's also faster) but it costs $99. It's pretty impressive. Depending on what you want to do that price tag can be worth it.

Issuu however, has won me over recently with their Smartlook Platform: http://issuu.com/smartlook

Here's a sample of Smartlook setup on a website:

http://www.ismartlook.com/

Plus it's completely free, which is nice.

A third alternative, which I've considered using myself is this free and open source code made by this guy named samurajdata. He calls it psview (PostScript Viewer). Anyone can download the source code and see it in action here:

http://view.samurajdata.se/

The converted PDFs losses quality as it converts to image fie, but it's fast and easy to setup.

I hope this helps!

Helper
+2  A: 

Google Docs pdf viewer serves PNG rendition of a particular pdf page when you click it. When loaded, it loads up first 2 pages (png for first two pages) and thumbnails for the first 5. And as and when a particular thumbnail is cicked, it loads up the page and +1 and -1 page too. This is what I want to produce for one of my clients. Any idea how can I achive the same?

More specifically, We have a PDF in Documentum CM and need to serve this to customer over the webtop. But can not let him download the whole PDF because we got bandwidth limitations. So, this solution would be optimum for me

What libraries does google use to fetch a PNG out of a pdf? They send a GET query for each user request to a page. It is like..

GET /gview? thid=120bcde295b57ec0&attid=0.1&a=bi& docid=b4f04dfc6cae59212c255322e4fa27a4%7C51cc6754817db14a75401d5c739dff04& chan=EgAAAHQUu8qVqHSEgVrnc5ZWHfBktx3Om%2BA1kNyD7fdMgL64& pagenumber=60& w=1024 HTTP/1.1

I want to be able to do the exact thing. Any leads would be real help. Thanks varun

varun
A: 

I need to display pdf documents on an embedded platform where no pdf reader exists.... therefor the pdf mus be converted to html

KL
+1  A: 

You can just use the Google Docs Viewer which also supports PDF documents. It allows you to embed it in your web page and point to the URL where the PDF is located (which doesn't have to be on the Google servers).

Example:

http://docs.google.com/viewer?embedded=true&url=http%3A%2F%2Fwww.domain.com%2Fdocument.pdf
tomlog
A: 

Some other alternatives depending upon what you're looking to do:

  • RAD PDF - ASP.NET component for displaying PDF documents, forms, etc. Also allows PDF searching, bookmarks, text selection, and basic editing.
  • Atalasoft - ASP.NET component for image viewing, but also allows PDF use as an image. Doesn't support any PDF features beyond simple viewing.
userx
(I work at Atalasoft) -- actually we support annotations, PDF page reordering/removing/adding, bookmarks, embedded links and more
Lou Franco
A: 

There is the Internet Archive BookReader available. It's a nice book viewer implemented in javascript (jQuery), so the client doesn't need a PDF reader nor Flash. Though it needs images for the book pages, you can easily connect it to your own image server, so you may try to convert a PDF to images via ASP.NET (or any other tool like XPDF). I found that this is simpler to implement than actually implementing an images viewer.

Also, it seems to support search highlighting (try it here), but I haven't investigated exactly which metadata are needed and in what format.

The last release file contains a simple example on how to use it. More details and examples can be found in the first link.

kepler