tags:

views:

560

answers:

4

Hi, Does anyone know of a good solution for converting PDF files to a word .doc files (not docx) programmatically? I've tried SautinSoft's solution but even though it does the job, it's not the best quality.

Any suggestions would be welcome.

Many thanks.

+1  A: 

As in "solution", a way to do it, probably, but you'd have to digg into this yourself:

The PDF file format is... quite hard to understand. First of all, it can't be compared to Word format at all. It's format is designed to produce a consistent look on all platforms and printers, Word therein, is a little less strict.

Editing PDF files, first, is quite hard too: because you don't have "text" like in Word; it's more like chunks of letters. These are all positioned individually.

The only doable solution I see is the following:

  1. Render the PDF to an image. (Thus requires a PDF rendering library!)
  2. Append this image into a .doc. (Thus requires a .DOC writing library!)

I think it's what SautinSoft is doing too; that's the reason of it's bad quality. Images can get quite huge if you want good quality (i.e. you can't get the optimization like generic fonts or repeating graphics, like you have with PDF files).

Pindatjuh
A: 

PDF is an 'endfile' display format so it throws away a lot of detail you would need in a word file (such as flow). There are tools out there but you are not likely to be totally happy with the results.

There is a blog post explaining the issues better at http://pdf.jpedal.org/java-pdf-blog/bid/12670/PDF-text

mark stephens
A: 

We offer a solution called EasyConverter SDK that you may wish to give a try:

http://www.pdfonline.com/easyconverter/sdk/index.htm

If you want to get a quick idea of what the results would look like before trying the evaluation version, you can use the online converter here first:

http://www.pdfonline.com/pdf2word/index.asp

There are indeed many considerations when converting a mostly static format like PDF to Word. EasyConverter SDK works nicely for most business documents while marketing documents (which typically utilize fancier layouts) are usually more challenging.

yu-chen-pdfonline-com
A: 

Convert the PDF to SVG and embed the SVG in the Word document.

Charles Stewart