tags:

views:

135

answers:

1

Hi,

I am relatively new in Word 2007 programming. Pardon me if this question is already asked. I would like to read a word table and its child cells and extract that text out in C# (VSTO tools). I would like to build out an xml from the data extract later.

Please guide me if anyone has done something of this sort. Would really appreciate.

Thank you. Anjan

A: 

Unless used in backward compatibility mode, Word 2007 produces documents in the "Office Open XML Format" for which Microsoft provides an library in .NET

This MSDN article provides various pointers and snippet, in C#, on how to do this kind of things. Also this Walkthough Word 2007 format may be useful.

If you need to access older MS-Word formats, you may be able to use or inspire yourself from the text-mining open source project (java).

mjv