views:

258

answers:

5

I need to extract some images from PowerPoint and Word documents, in order to manipulate them, and after that, put the images back in the MS Office files.

Do you know any Java or C++ library that does this? It is better if it's open-source.

+1  A: 

The company I work for, SoftArtisans, has a product called OfficeWriter that allows you do that, among other things, for Word and Excel (PowerPoint is planned to be added in the future). It is not free or open sourced though.

On the other hand, if you are working strictly with 2007 format (XML based) you can probably use OpenXML.

Tamar
If you are working with the XML formats (as opposed to binary), docx4j is an open source option.
plutext
+4  A: 

Apache has a project called "POI" explicitly made for interacting with MS Office formats from Java. Hopefully that does it for you!

http://poi.apache.org/

Brent Nash
I have used Apache POI before for working with DOC and XLS files. This is a good API for working around with Office Files. But I don't think they support Office 2007 Files though (in .docx and .xlsx format)
Kushal Paudyal
They do, as of v3.5
plutext
+1  A: 

Apache POI can handle Word documents via its HWPF module, and extract or insert images from these. Although it's not well documented, check out the POI unit tests for image manipulation within Word (the unit tests seem to be the best documentation for this module).

Failing that, the COM interface is accessible via (say) JACOB. That's probably more work, but will make available APIs not exposed via POI.

Brian Agnew
+1  A: 

In terms of C++, Word exposes a COM API to allow you to manipulate its document format, so as long as you have Word installed on the machine, you can do this in C++ quite easily. Word isn't open source, but you probably have the license anyway.

Yishai
A: 

You can use Aspose components. Aspose.Words for .NET to work with Microsoft Word formats and Aspose.Slides to work with PowerPoint formats.

romeok