views:

2968

answers:

6

Hi there,

I have a template in word (.docx) format and want to replace some placeholders in this template with my own data, do you know where I can find the right classes for this project? It would be nice for me to know the namespace. Are there any newer classes for handling word documents in the docx format (xml) than the com classes? Thank you for all your answers, I hope someone can help me with my problem.

greets

Sebastian

+5  A: 

The new office formats (docx, xlsx, etc) are zip files that contain a collection of xml files. With that in mind you have a couple of approaches.

  1. You can use the Open XML SDK located at http://www.microsoft.com/downloads/details.aspx?FamilyId=AD0B72FB-4A1D-4C52-BDB5-7DD7E816D046&displaylang=en

  2. You can unzip the docx file, do a search and replace for your tokens, and zip it back up.

There is a website at openxmldeveloper.org that is just for this kind of thing. Also, bear in mind that they are already shipping a beta version 2 of the SDK.

Chris Lively
SkippyFire
+2  A: 

Eric White has touched on exactly this subject in a blog article detailing a program meant to validate source snippets embedded in DocX files. Beyond just that article I highly recommend reading his series on Office Open XML and C#.

sixlettervariables
A: 

Great thanks, it helped me a lot.

Xelluloid
A: 

ok now I have another question, is there any useful and cheap solution for searching all the placeholders? I mean placeholders are such sdt-tags and using xml I have to search for all of them, using the Open XML Format 2.0 I can get all elements of the body of the MainDocumentPart but the sdt-tags aren't only sdt this, there are sdtblocks, sdtruns etc. Do you know a way to solve that? I read that it should be possible to use linq this time, but using this

doc.MainDocumentPart.Document.Body.Elements().Select(a => a.InnerText.Contains("sdt"));

won't work, because this returns a boolean expression but I need an OpenXMLElement. I know how to search them by using many many loops but it is really annoying, I thought linq could perhaps help me a bit, by searching only the elements where the InnerText (or the InnerXml) contains sdt. Then I would only search further in this elements.

Xelluloid
A: 

by the way using xml I found this solution that finds ALL sdt-nodes

 NameTable nt = new NameTable();
        XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
        nsManager.AddNamespace("w", wordmlNamespace);

        XmlDocument xDoc = new XmlDocument();
        xDoc.Load(doc.MainDocumentPart.GetStream());

        XmlNodeList nodeList = xDoc.SelectNodes(@"./w:document/w:body//w:sdt", nsManager);

it works but doesn't the Open XML Format SDK 2.0 give me any chance for this?

Greets Sebastian

Xelluloid
A: 

I used this one:

IEnumerable test2 = from element in body.Elements() where element.InnerText.Contains("sdt") select element;

Freek Bos