views:

869

answers:

6

What solutions are there? I know only solutions for replacing Bookmarks in Word (.doc) files with Apache POI?

Are there also possibilities to change images, layouts, text-styles in .doc and .ppt documents?

I think about replacement of areas in Word and PowerPoint documents for bulk processing.

Platform: MS-Office 2003

+2  A: 

If you include using other Office suits as an option, here's a list of possible solutions:

Using POI you can't edit .pptx file format, but you don't depend on the apps installed on the system. Other two options, on the contrary, make use of other apps, but they are definitely better for dealing with presentations. OpenOffice has better compability with older formats, by the way. Also if you use UNO, you'll have a great choice of languages, UNO exists for Java, C++, Python and other languages.

Malcolm
Is POI the only solution? What is about format changes (.ppt => .pptx)? I heard that POI has problems with UTF-8, Formatting etc.
Martin K.
Well, I have seen different libraries to work with .ppt files on the Web, but when it comes to changing the documents, they are useless. Another solution is probably using OpenOffice or PowerPoint APIs for changing the documnets, I don't know, whether this solution suits you or not.
Malcolm
+3  A: 

What are your platform limitations?

Obviously Apache POI will get you at least part of the way there.

Microsoft's own COM API's are fairly powerful and are documented here. I would recommend using them if a) you are not running in a server (many users, multithreaded) environment; b) you can have a proper version of powerpoint installed on the production machine; and c) you can code against a COM object model.

Peter Stephens
I'm flexible with platform limitations. The best one ist the licence free, for every document working Framework. ;) Webservices which abstract the COM object model might be a way.
Martin K.
You could also make use of OpenOffice, I've added information to my post.
Malcolm
+1  A: 

My experience is not directly with Power Point, but I've actually rolled my own WordML (XML) generator. It a) removed all dependencies on Word, b) was very fast c) and let me build up documents from scratch.

But it was a lot of work to create. And I was only creating a write only implementation.

I'm not as familiar with Power Point, so this is conjecture, but you may be able to roll your own by reading XML (Power Point 2003??) and/or cracking the Office Open XML file (zipped XML), then using XPath to manipulate the data, and then saving everything back to disk.

This won't work on older OLE Compound Document based Power Point files though.

Peter Stephens
+3  A: 

It's a bit pricey, but Aspose.Slides is a very powerful library for manipulating PowerPoint files

Conrad
An open source solution is docx4j, which can now also manipulate pptx files. Its Java, using JAXB.
plutext
+1  A: 

I've done something like that before: programmatically accessed and manipulated PowerPoint presentations. Back when I did it, it was all in C++ using COM, but similar principles apply to C#/VB .NET apps, since they do COM interop very easily.

What you're looking for is called the Office Document Model. Basically, Office applications expose their documents programmatically, as trees of objects that define their contents. These objects are accessible via an API, and you can manipulate them, add new ones, and do whatever other processing you want. It's exceedingly powerful; you can use it to manipulate pretty much all aspects of a document. But you'll need an installation of Office and Visual Studio to be able to use it.

Some links:

Hope this helps!

doihaveto
+1  A: 

Apparently new users can only include one link per posting. How lame! :)

Here's the other link I meant to include:

doihaveto
Wow, that was really helpful, thx!
Martin K.