views:

8916

answers:

7

I'm writing some code that handles logging xml data and I would like to be able to replace the content of certain elements (eg passwords) in the document. I'd rather not serialize and parse the document as my code will be handling a variety of schemas.

Sample input documents:

doc #1:

   <user>
       <userid>jsmith</userid>
       <password>myPword</password>
    </user>

doc #2:

<secinfo>
       <ns:username>jsmith</ns:username>
       <ns:password>myPword</ns:password>
 </secinfo>

What I'd like my output to be:

output doc #1:

<user>
       <userid>jsmith</userid>
       <password>XXXXX</password>
 </user>

output doc #2:

<secinfo>
       <ns:username>jsmith</ns:username>
       <ns:password>XXXXX</ns:password>
 </secinfo>

Since the documents I'll be processing could have a variety of schemas, I was hoping to come up with a nice generic regular expression solution that could find elements with password in them and mask the content accordingly.

Can I solve this using regular expressions and C# or is there a more efficient way?

+6  A: 

I'd say you're better off parsing the content with a .NET XmlDocument object and finding password elements using XPath, then changing their innerXML properties. It has the advantage of being more correct (since XML isn't regular in the first place), and it's conceptually easy to understand.

Welbog
+14  A: 

This problem is best solved with XSLT:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:template match="@* | node()">
     <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
     </xsl:copy>
    </xsl:template>
    <xsl:template match="//password">
     <xsl:copy>
      <xsl:text>XXXXX</xsl:text>
     </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

This will work for both inputs as long as you handle the namespaces properly.

Edit : Clarification of what I mean by "handle namespaces properly"

Make sure your source document that has the ns name prefix has as namespace defined for the document like so:

<?xml version="1.0" encoding="utf-8"?>
<secinfo xmlns:ns="urn:foo">
    <ns:username>jsmith</ns:username>
    <ns:password>XXXXX</ns:password>
</secinfo>
Andrew Hare
A: 

You can use regular expressions if you know enough about what you are trying to match. For example if you are looking for any tag that has the word "password" in it with no inner tags this regex expression would work:

(<([^>]*?password[^>]*?)>)([^<]*?)(<\/\2>)

You could use the same C# replace statement in zowat's answer as well but for the replace string you would want to use "$1XXXXX$4" instead.

John Conrad
+4  A: 

From experience with systems that try to parse and/or modify XML without proper parsers, let me say: DON'T DO IT. Use an XML parser (There are other answers here that have ways to do that quickly and easily).

Using non-xml methods to parse and/or modify an XML stream will ALWAYS lead you to pain at some point in the future. I know, because I have felt that pain.

I know that it seems like it would be quicker-at-runtime/simpler-to-code/easier-to-understand/whatever if you use the regex solution. But you're just going to make someone's life miserable later.

Michael Kohne
You make a good point, I think some of the other proposed solutions here (XSLT, XPATH or XDocument) will save me from some pain in the future.
Millhouse
There are very few absolute rules that aren't riddled with exceptions. "Don't ever use string manipulation tools to parse or modify XML" is one of them.
Robert Rossney
A: 

Regex is the wrong approach for this, I've seen it go so badly wrong when you least expect it.

XDocument is way more fun anyway:

XDocument doc = XDocument.Parse(@"
      <user>
       <userid>jsmith</userid>
       <password>password</password>
      </user>");

doc.Element("user").Element("password").Value = "XXXX";

// Temp namespace just for the purposes of the example -
XDocument doc2 = XDocument.Parse(@"
      <secinfo xmlns:ns='http://tempuru.org/users'&gt;
       <ns:userid>jsmith</ns:userid>
       <ns:password>password</ns:password>
      </secinfo>");

doc2.Element("secinfo").Element("{http://tempuru.org/users}password").Value = "XXXXX";
Kev
A: 

Here is what I came up with when I went with XMLDocument, it may not be as slick as XSLT, but should be generic enough to handle a variety of documents:

            //input is a String with some valid XML
            XmlDocument doc = new XmlDocument();
            doc.LoadXml(input);
            XmlNodeList nodeList = doc.SelectNodes("//*");

            foreach (XmlNode node in nodeList)
            {
                if (node.Name.ToUpper().Contains("PASSWORD"))
                {
                    node.InnerText = "XXXX";
                }
                else if (node.Attributes.Count > 0)
                {
                    foreach (XmlAttribute a in node.Attributes)
                    {
                        if (a.LocalName.ToUpper().Contains("PASSWORD"))
                        {
                            a.InnerText = "XXXXX";
                        }
                    }
                }    
            }
Millhouse
I think you want to use LocalName for both elements and attributes. Also, if you make this a recursive function that walks the XML tree, you don't have to start out by building a list of all the elements in the document.
Robert Rossney
A: 

The main reason that XSLT exist is to be able to transform XML-structures, this means that an XSLT is a type of stylesheet that can be used to alter the order of elements och change content of elements. Therefore this is a typical situation where it´s higlyrecomended to use xslt instead of parsing as Andrew Hare saidin a previous post.

pelle