views:

837

answers:

2

I've googled (without any luck) for open source software that can convert doc, ppt, and pdf to HTML5. (Exactly what Scribd does) Are there open source equivalents to the type of conversion Scribd does?

If anyone knows of a paid service, that would also work. Scribd has an API, but that's for use with the flash viewer. Also, I would like to host my own content as I need further control over it.

A: 

http://wvware.sourceforge.net/

wvHtml: convert your Word document into HTML4.0.

Possibly: http://www.abisource.com/ but in this case it looks like "open doc" > "export html" manually, maybe plugins help. Not shure, what do you mean: "source software that can convert".

Or this: http://www.zope.org/Members/sf/NuxDocument

PF4Public
+4  A: 

You're unlikely to find a single offering that does all this, especially in the open source world. It's more likely that you'll end up relying on a mishmash of things, and may even need to chain some converters in order to get to HTML. (Eg PDF -> ps -> HTML)

OpenOffice supports conversion to HTML, and can be called from the command line.

http://pdftohtml.sourceforge.net/ looks reasonably good at converting pdf to html.

For Doc that is Word ML or OpenXML format it's conceivable that you could use XSLT transforms since both input and output formats are XML. I've seen some stylesheets floating around the net that do this, but YMMV.

Incidentally, why is there a specific requirement for open source? MS Powerpoint already supports save-as-HTML for example.

imoatama
+1 for OpenOffice
vladr