views:

66

answers:

2

Are there any classes, COM objects, command line utilities, or anything else that I can make an API for that can convert a PDF to an HTML document? Obviously the conversion might be a little rough since PDFs can contain a lot more than HTML can describe. I found a utility called pdftohtml on Source Forge, but quite honestly it does a horrible job with the conversion. I don't care if the software is free or commercial, but is there anything out there at all that I can incorporate with my own software to do this sort of conversion at least decently? I know Google's developed their own method of doing this, since you can click "View as HTML" on a PDF attached to an email through Gmail, but I was hoping there was something out available to the public.

Remember, PDF to HTML. I'm NOT worried about HTML to PDF.

A: 

well one solution i can think of is to write little program that reads pdf text using library called iText and then generate html files.

narup
I was hoping for something that can handle the conversion well though. i.e. Use colors, basic formatting, and images.
SoaperGEM
A: 

well for java based PDF solutions...we dont have a clean way i guess-still.. all solutions are primitive and kind of workarounds... No easy solution for 1. Designing a template of a PDF 2. Then at runtime using java, populate data into this template...either using xml or other datasources...

such a simple requirement and NONE has a good "open-source and free" solution yet !

Eclipse BIRT comes close.. but does not handle Barcode elements ..OOB.

Samant