views:

79

answers:

2

I am looking for a python based microsoft office parser - specifically powerpoint.

I want to be able to parse PPT in python and extract things like text and images from the powerpoint file.

Is there a library available?

+1  A: 

You might find such a beast, but I'd bet against it; you're looking for two rare properties together.

You might consider instead using the Open Office SDK, which already has vast amounts of machinery to read power point files, and abuse it for your purposes. This is all Java, not Python, but my guess is the learning curve to learn Java is much smaller than the learning curve to figure out how to read PowerPoint files.

Ira Baxter
+4  A: 

I don't think there is such a library.

What you can do is use pywin32 package to access PowerPoint's COM.

Here is a very nice introduction to using the win32com module to automate tasks in PowerPoint someone has written: http://www.s-anand.net/blog/automating-powerpoint-with-python/

PreludeAndFugue
Thanks! I am on it now. The link was very useful in understanding how to go about the entire process.
ramaz