views:

727

answers:

8

I have been asked to automate a particular task at work today which takes up a lot of our time! The following is what needs to be done and I would appreciate any help on how I can do this (Implementation Advice) within the realms of my knowledge, if possible.

Problem

I have a PowerPoint document (.ppt). I would like to extract text from there (the text is in bullet point format). I would like to insert these bullets points into an Excel sheet, each bullet point should be a row. I would also like to put in the adjacent column the page this bullet point text was taken from.

So, basically: Extract from ppt --> Insert into Excel sheet each row being a bullet point.

Technologies available to me

Perl, PHP and Java.

I would prefer PHP to be honest as this is my primary language, but I am happy to consider anything else you guys/gals think is best. Second would be Perl and then Java. I do not want to be compiling classes and installing the JDK just for this! :)

Key Questions

  • How do you reference a bullet point?
  • Am I likely to end up with just a load of unstructured text in the Excel sheet?
  • Are there any barriers to reading from a ppt file?

Update

I would consider MS technologies (VB, etc.) if it makes life easier but I have never used it and I despise MS technology! Hope I don't get flamed by the evangelists! :)

+1  A: 

Is Visual Basic for Applications not available to you? That should be built-in to Office, and since you're going Office-to-Office it might be easier.

Chris Simmons
+7  A: 

Puh...you'll have a hard time in dealing with MS Office files using Perl, PHP or Java. When I had to do automation for MS Office files I most often used VBA (Visual Basic for Applications). Take a look at it. For a lot of stuff you can just record a macro, take a look at the produced code and learn how things are being referenced. Then take pieces of the generated code and create your own VBA modules and tools. I could imagine that such a functionality could perfectly fit as an Add-in for Power Point.

Juri
Even if you know nothing about VBA, this is by far the best tool for the job. Any other approach is likely to be a 'world of pain'.
Don
Thanks for the advice Juri and Don!
Abs
I've had to do something similar with Word documents, VBA is definitely the best way if you can use it.
Travis Beale
+1  A: 

You could use OpenOffice.org's presentation app (Impress) to import the Powerpoint file. You could then export it in native OpenOffice.org format, which is XML. You should then be able to parse the plain-text XML with the language of your choice.

As other people have pointed out, if you want to work with the Powerpoint format directly, you really need to use a Microsoft language (VB, VBA, C#, etc.).

Travis Beale
+1  A: 

I'm thinking you may want to look into programs that convert from PPT to a CSV file possibly with PDF in the middle? Once the data is in CSV format, you may be able to process it by php/perl much more easily.

Doing this from scratch will be very time consuming because the Office document formats are very complicated in general.

Plasmer
+1  A: 

If you have the Zend Framework available, it can help considerably. See here for helpful documentation. See here for writing to excel files.

Shadow
+1  A: 

This sounds a lot like what I do at work, though I work mostly in Excel and Word. Your best bet would be use VBA in PowerPoint to look at each page and find the bullets. Then write to a file in CSV format, each bullet on a different line, which will open in Excel, each bullet on it's own row.

Finding what is a bullet and what isn't might be trick. Try recording some macros selecting, adding and deleting bullets, and maybe changing the level of a few. That should give you enough info about what objects to be looking for, and how they can be worked with.

CDspace
+8  A: 

It can be done with Perl. Pretty much anything you can do with VBA can be done with Perl via Win32::OLE. I've used the Win32::OLE module to work with MS-Office documents, both extracting and creating content. It's been awhile though. Start here, http://win32.perl.org/wiki/index.php?title=Win32_Perl_Modules about the middle of the page.

The VBA documentation on each of the objects is useful for reference, finding what objects there are and the methods and properties available on them.

Rob K
Interesting, will defo have a look at this!
Abs
For examples of controlling powerpoint, see Win32::Powerpoint on CPAN: http://search.cpan.org/dist/Win32-PowerPoint/lib/Win32/PowerPoint.pm
daotoad
I would use VBA if you are going to something trivial. If you're trying to tie Excel into some program you've already written, OLE is better.
Brad Gilbert
Again, Perl would sound better for him if he already knows it. However in my opinion writing a PowerPoint Add-on using VBA is much more suitable. You would just have to open the power-point file, click on a menu button (which you could easily add) which then starts your export VBA add-on.
Juri
@Juri, it sounds like this is a repeated task, something that has to be done on a regular basis. The thing about using Perl is that it can be done "single click", or even as a scheduled task, so that there's absolutely no need for user interaction with PowerPoint or Excel at all. And doing it in Perl really isn't painful. Perl will be a lot more friendly to him than VBA since he prefers PHP.
Rob K
+7  A: 
Sinan Ünür
Thank you - Awesome!
Abs