tags:

views:

4124

answers:

4

how to Parse a PDF file and write the content in word file using Java?

+3  A: 

For parsing a PDF file in Java, you can use Apache PDFBox: http://incubator.apache.org/pdfbox/

For reading/writing Word (or other Office) file formats in Java, try POI: http://poi.apache.org/

Both are free.

breakingobstacles
+1  A: 

You might want to try any of these:

Once you are reading the contents of the PDF file, you can as well store them in a ODT file or a text file. For ODT file, try http://odftoolkit.openoffice.org.

Best!

Amit
+1  A: 

Try the iText java library:

iText is an ideal library for developers looking to enhance web- and other applications with dynamic PDF document generation and/or manipulation.

It can be used for your parsing step.

As for generating word documents - the OpenOffice Java API might be able to generate Word compatible docs (no personal experience with this API).

gimel
A: 

You could use iText if the source PDF is mostly text. Images and such are quite hard to handle while parsing. If it's text only, it's as easy as 10 lines of code. See the iText manual for examples.

For writing word files there's only Apache POI. It can be a little tricky to figure out, but for such a simple task it shouldn't be any problem.

Jes