Convert PDF to anything which can be opened by Word

views:

answers:

+2 Q:

Convert PDF to anything which can be opened by Word

Want to do it via C#, all inline, no Process.Start()...and free...could be RTF, HTML, whatever the case may be...as long as I can open in Word, which I can then save off as RTF, which I can then load within a RichTextBox.

I'm aware similar questions have flooded this forum over the years, nothing that seems to address what I am asking though.

EDIT:

Looks like it can be done here: http://www.itextpdf.com/examples/iia.php?id=275

+1 A:

Use a PDF library, such as iTextSharp to parse the PDF. You will be able to access all text and images from the PDF and convert to whatever representation you want.

There are other solutions (such as installing xpdf and shelling to it - it will convert to html if the right command line arguments are passed in).

Oded 2010-09-10 20:17:50

I keep hearing talk of parsing via iTextSharp to get text/images, fair enough. Where are some samples in doing this other then making use of the PRTokeniser within iTextSharp?

Aaron 2010-09-10 21:25:15

I am not sure if Word could open a pdf unless you created the pdf in a word document.

I think the only quick solution to that would be to purchase or find a 3rd party library that does PDF handling, then use it's API to pull out the text you need. The text any any case would be extremely badly formatted at that point i am sure. Also be aware that some pdfs that show text actually have it saved as an image, so there would be no way to get the data out.

aceinthehole 2010-09-10 20:18:23

ansaurus

tags:

views:

answers:

Convert PDF to anything which can be opened by Word

related questions