views:

234

answers:

4

Hi all,

I've a problem: my application must convert ms word documents (imported from another system) into rtf documents, in order to be manipulated with OOo APIs and to be immune from mistakes (for coding incompatibility reasons).

I ask you: how can I manipulate ms word documents directly from my Java application? There are APIs (like POI or OOo) that allow me to do my work without any coding incompatibility?

Thanks in advance. Best regards,

-Paolo

+1  A: 

When I was asked to provide a way to reliably convert a doc to a tiff I did some research. There is a number of libraries out there - both free and commercial which claim to be able to render ms.docs. None of them provide 100% accurate rendering.

The way I had to do it is to run MS Word in a wrapper and manipulate it to do what I need through the OLE Automation. This (running Word in background) in itself has quiet a few gotchas but with thoughtful design you can make it work.

Your case is even easier than mine because all you need is to open the doc and then save it as.

Edit

@Paolo - There you go. I've been through the same - evaluating various packages, OO included and finding that they are mmmm... less than precise. Of course it all depends on how strict you customers are about document formatting. Mine were extremely picky - up to the margin sizes and picture positioning.

Another option would be to give (and get approval of) a list of imprecisions. Unfortunately with every new doc you will run a chance to hit a new one

mfeingold
Thanks, but, as I say in my previous answear, on Linux server I don't have Ms Office installed, therefore I'm researching a Java library that allows me to transform programmatically the Ms Word Documents in RTF in order to work on this last type in compatibility mode with OOo APIs interface.Thanks for answear. Best Regards,-PaoloP.S.: does someone know a such Java Library (i.e. more powerfull of POI)?
hailpam
A: 

If you have Word installed on your target computer, you can use the Microsoft Office Automation Interfaces. This is a set of COM components which you can use to open, create, save word documents. (You can esentially control an installation of Microsoft Word programmatically.)

If you have Word installed, you should have a registered type library called "Microsoft Word XX.X Object Library". I'm not sure how to access COM from java though.

Matt Brunell
Why the downvote?
Matt Brunell
A: 

Thanks Matt and mgeingold, but my system runs on Linux server machines (such as all production systems for public) and I've installed only OOo.

Using the OOo java APIs I can open, manipulate and save the documents, but, in this last period I'm viewing a lot of problems concerning the incompatibility for coding between the Ms Word closed coding and the OOo opend document format coding (I refer to swriter). In many cases, list with particular bullets (e.g., '-' or also nested list), page numbering (e.g., 1 of x format), and many others formatting options, the output document (from manipulation) shows many errors due to, I think, incompatibility between the two coding formats.

Now, I'm studying the Apache POI capabilities in order to understand if I can open Ms Word with it, and save the document in RTF format that is and interchange format able to reduce the incompatibility to minimal level.

Do you have a same problem? Can you indicate me a Java open source library more powefull of POI? Or, can you suggest me a combined approach such as POI+iText to do the conversion step ms word to rtf?

Thanks in advance. Best Regars,

-Paolo

hailpam
A: 

Docvert lets you set up a web service to convert Word documents to Open Office format. It craps out on the OLE objects though.

pocketfullofcheese
Hi pocket,first of all, thanks for yor reply!I need of a suite of Java API which allows me to automate the process of manipulation of MS Word Documents. At this moment, as alternative solution, we get the PDF version of document and work with iText and its watermarking capability; but, as you can understand, this is a very limited solution .Regars,- Paolo
hailpam