ansaurus

Question

How to grab text from word (docx) document in C#?

Answer 1

+1 A:

Take a look at the Open XML Format SDK 2.0. There some examples on how to process documents, like this.

Although I have not used it, there is this Open Office XML C# Library that you can take a look at as well.

Magnus Johansson 2009-07-08 17:53:05

Answer 2

+4 A:

Your problem is the XML namespaces. SelectNodes don't know how to translate <w:t/> to the full namespace. Therefore, you need to use the overload, that takes an XmlNamespaceManager as the second argument. I modified your code a bit, and it seems to work:

    public static string TextDump(Package package)
    {
        StringBuilder builder = new StringBuilder();

        XmlDocument xmlDoc = new XmlDocument();
        xmlDoc.Load(package.GetPart(new Uri("/word/document.xml", UriKind.Relative)).GetStream());
        XmlNamespaceManager mgr = new XmlNamespaceManager(xmlDoc.NameTable);
        mgr.AddNamespace("w", "http://schemas.openxmlformats.org/wordprocessingml/2006/main");

        foreach (XmlNode node in xmlDoc.SelectNodes("/descendant::w:t", mgr))
        {
            builder.AppendLine(node.InnerText);
        }
        return builder.ToString();
    }

driis 2009-07-08 17:59:26

Worked perfectly, thanks.

JoeS 2009-07-08 18:13:36

ansaurus

tags:

views:

answers:

How to grab text from word (docx) document in C#?

related questions