tags:

views:

159

answers:

4

We are in need of converting all MS Office documents to PDF, TIF, or any similar image format with no loss in formatting (these are official documents that cannot have tampering). Is there any way to do this without installing Office on the machine that would do this? Ideally, this would go on a server and run multi-threaded without the overhead of Office Automation.

Thanks in advance!

+2  A: 

You could use a third-party library such as Aspose.NET for document conversion, but I'm afraid - if high-fidelity rendering is critical - there is no way around using the original application.

Microsoft Office provides a converter API which allows conversions without Office being installed. However, not only might you be facing license issues (IANAL), this API only supports conversions of text-processing formats that don't require rendering the document (e.g. RTF -> DOC, DOC -> DOCX), so it is not really an option for you.

Update: Probably the best option would be to have a look at the SharePoint 2010 conversion engine, which is exactly made for automated (server-side) document conversions. It's quite heavy though (both hardware and pricing) so maybe it is overkill for your use-case.

0xA3
It's funny that you mention Aspose, because their tool appears to have the exact same problem I'm running into with Office Open XML (the proximate error is always an attempt by .Net to set the max size of a Stream, triggered by something an xmlwriter is doing).
MusiGenesis
A: 

If this application will be run on a dedicated machine (i.e. the machine's only job is to convert a gigantic collection of Office documents), your safest bet is probably to use Office automation in a single-threaded manner and let the app happily convert one file at a time. A multi-threaded Office Automation app would probably convert documents at a faster overall rate (especially on a multi-core processor), up to the point where the server crashes.

Office Open XML is a non-Office-Automation alternative, but since I'm currently battling its tendency to produce OutOfMemoryException errors when exporting to relatively small Excel files (~1MB), I can't really recommend it.

MusiGenesis
A: 

You can convert a DOC to PDF using OpenOffice with Java (also supports xls, ppt, and html), but you must have OpenOffice installed either on that machine or another on the network.

You need to have office installed to use the Microsoft.Office.NET API. You also need to have Office installed to use any of the Print to PDF libraries, such as PDFCreator or PrimoPDF.

Sephrial
A: 

There is a free tool at www.youbindit.co.uk that allows conversion of multiple and varying formats of MS Office documents to one pdf.

You can do numbering and tabs and such like too.

Echidna2000