views:

283

answers:

4

Is there a way in which I can programmatically access the document properties of a Word 2007 document?

I am open to using any language for this, but ideally it might be via a PowerShell script.

My overall aim is to traverse the documents somewhere on a filesystem, parse some document properties from these documents, and then collate all of these properties back together into a new Word document.

I essentially want to automatically create a document which is a list of all documents beneath a certain folder of the filesystem; and this list would contain such things as the Title, Abstract and Author document properties; the CreateDate field; etc. for each document.

+1  A: 

My guess is that your best bet is VB or C# and the Office Interop Assemblies. I'm unaware of a native way (within Powershell) to do what you want.

That said, if you use VB or C#, you could write a powershell cmdlet to what you are the collation. But at that point, it might be more simple to just write a console app that runs as a scheduled task instead.

Nate Bross
+1  A: 

I recently learned from watching a DNRTV episode that Office 2007 documents are just zipped XML. Therefore, you can change "Document.docx" to "Document.docx.zip" and see the XML files within. You could probably get the properties via an interop assembly in .NET, but it may be more efficient to just look right into the XML (perhaps with LINQ to XML or some native way I am unaware of).

Ocelot20
+1  A: 

I wrote up how to do this back in the Monad beta days. It should still work I think.

Keith Hill