ansaurus

Question

Answer 1

+2 A:

May i suggest another way: Read the XML as a pure String, remove all XML-Elements and check the resulting string.

Imports System.IO
Imports System.text.RegularExpressions

Dim readFile As String = File.ReadAlltext("yourPathToFile.doc")
readFile = Regex.Replace(readFile, "<[a-zA-Z0-9/:]+>", String.Empty)

For Each foundPart As Match In Regex.Matches(readFile, "\[[a-zA-Z0-9]+\]")
        ' do something here with the things we found'
Next

Some additional things might be needed, f.e. replacing spaces etc.

Edit: Yes, I understand that the RegEx Expression is far from perfect for this...

Edit2: RegEx to remove XML Tags with content

Bobby 2009-11-23 10:10:33

Actually I'm trying to come up with some regex that I could use - and hope that's not a dark corner ;)

brovar 2009-11-23 10:36:19

I've also found this question, maybe it helps: http://stackoverflow.com/questions/121656/regular-expression-to-remove-xml-tags-and-their-content

Bobby 2009-11-23 10:52:57

Answer 2

A:

What about this SDK?

http://www.microsoft.com/downloads/details.aspx?FamilyId=C6E744E5-36E9-45F5-8D8C-331DF206E0D0&displaylang=en

Lex Li 2009-11-23 10:33:08

Hmm, I'll take a look but it's SDK for OpenXml which is used by Office 2k7 and not 2k3. And I don't know if using something so extended wouldn't be like shooting a sparrow with a cannon ;)

brovar 2009-11-23 10:42:07

ansaurus

tags:

views:

answers:

.NET - working with MS Word XML

related questions