ansaurus

Question

Answer 1

A:

I would do something like this:

char[] charToRemove = { (char)8217, (char)8216, (char)8220, (char)8221, (char)8211 };
char[] charToAdd = { (char)39, (char)39, (char)34, (char)34, '-' };
string cleanedStr = "Your WordML filled Feed Text.";

for (int i = 0; i < charToRemove.Length; i++)
{
cleanedStr = cleanedStr.Replace(charToRemove.GetValue(i).ToString(), charToAdd.GetValue(i).ToString());
}

This would look for the characters in reference, (Which are the Word special characters that mess up everything and replaces them with their ASCII equivelents.

Jeremy Reagan 2008-10-27 22:14:54

Answer 2

A:

Jeff Attwood blogged about how to do this a while ago. His post contains some c# code that will clean the WordML.

http://www.codinghorror.com/blog/archives/000485.html

d4nt 2008-10-28 09:56:56

Jeff's article is about cleaning up the nasty HTML that Word generates, not stripping out the XML elements from a WordML file.

Chris Zwiryk 2009-11-05 17:02:42

The questioner was saying how content that is copied and pasted from word contains lots of unwanted html tags. Jeff's code will remove those.

d4nt 2009-11-09 10:53:24

Answer 3

+1 A:

I haven't yet worked with WordML, but assuming that its elements are in a different namespace from RSS, it should be quite simple to do with XSLT.

Start with a basic identity transform (a stylesheet that add all nodes from the input doc "as is" to the output tree). You need these two templates:

  <!-- Copy all elements, and recur on their child nodes. -->
  <xsl:template match="*">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <!-- Copy all non-element nodes. -->
  <xsl:template match="@*|text()|comment()|processing-instruction()">
    <xsl:copy/>
  </xsl:template>

A transformation using a stylesheet containing just the above two templates would exactly reproduce its input document on output, modulo those things that standards-compliant XML processors are permitted to change, such as entity replacement.

Now, add in a template that matches any element in the WordML namespace. Let's give it the namespace prefix 'wml' for the purposes of this example:

  <!-- Do not copy WordML elements or their attributes to the 
       output tree; just recur on child nodes. -->
  <xsl:template match="wml:*">
    <xsl:apply-templates/>
  </xsl:template>

The beginning and end of the stylesheet are left as an exercise for the coder.

ChuckB 2008-10-28 13:57:04

ansaurus

tags:

views:

answers:

Strip WordML from a string

related questions