tags:

views:

293

answers:

7

How is it possible to take an XML document and in .Net scramble all text elements to make them un-readable to a human?

E.g. Make

<root><Node1>My Name is Ben</Node1></root>

Into

<root><Node1>klj sdlasd asdjfl</Node1></root>


Points:

  • We only need scrambling not encryption, i.e. not human readable at a glance. It doesn’t matter if the user could de-code it with 30 seconds effort.
  • We don’t know the schema of the XML, so all text elements must be located in code.
  • We need to de-scramble the XML later – so please show how one would do that too.
  • It needs to be simple, e.g. no pre-sharing keys, certificates, etc.
  • Only the text elements should be scrambled - not the nodes, attributes, etc.
+1  A: 

It would be simpler to just use the built in encryption classes in the .NET framework if you're trying to obfuscate it, than building a scrambling class. If security isn't your concern, then just hard-code the password and initialization vector.

Out of curiosity, why would you scramble it, only to have it human-readable with little effort?

nasufara
Ok sure - how does one identify all the text elements and encrypt those using .NET without encypting the Nodes, etc?
Ben Breen
You could use an XmlDocument and loop through all the elements on the document, encrypting just the contents and base64ing them, but why do you need to only encrypt the element contents? Couldn't you just encrypt it when you save it, and then decrypt it when you need to parse it? I suppose I'm a little confused with what you're trying to do.
nasufara
A: 

you can implement your own ROT13 code.

Baget
+2  A: 

The simplest two-way encryption I can think of is to just take the ASCII value of each character and add a constant to it, say 3. Then convert it back by subtracting 3 from each character.

Original:

Hello this is scrambled.

Scrambled:

Khoor#wklv#lv#vfudpeohg1

Python code because I love Python too much:

def scramble(str):
    return ''.join([chr(ord(c)+3) for c in str])

def unscramble(str):
    return ''.join([chr(ord(c)-3) for c in str])
Kai
Yep, this is what I meant by using the Caesar cipher, but better explained. +1
Mark B
You would need to be careful at the edges of the alphabet that you don't convert into a reserved XML character (like "|" or backspace for example)
Guy Starbuck
A: 

You could just use a Caesar cipher on each node's contents.

Mark B
+1  A: 

Just use Convert.ToBase64String and Convert.FromBase64String

Has the additonal advantage of ensuring that you don't have to worry about escaping characters, although it does increase the length of the strings a bit.

Eric Petroelje
A: 

To identify the text elements, load the unscrambled XML data into an XmlDocument. Then recursively walk down the ChildNodes collection, checking the XmlNode.NodeType of each node. If it's XmlNodeType.Text, then it's a piece of text.

If you also want to scramble attribute values, then look out for XmlNodeType.Attribute as well.

itowlson
+1  A: 
using System;
using System.Linq;
using System.Xml.Linq;
using System.Collections.Generic;

namespace App
{
    class Scrambler
    {
        public static void ScrambleTextNodes(XContainer xml)
        {
            foreach (XText textNode in GetDescendantTextNodes(xml))
                textNode.Value = Scramble(textNode.Value);
        }

        public static void UnScrambleTextNodes(XContainer xml)
        {

            foreach (XText textNode in GetDescendantTextNodes(xml))
                textNode.Value = UnScramble(textNode.Value);
        }

        public static IEnumerable<XNode> GetDescendantTextNodes(XContainer xml)
        {
            return xml.DescendantNodes().Where(node => node.NodeType == System.Xml.XmlNodeType.Text);
        }

        public static string Scramble(string s)
        {
            var a = s.Select(ch => (char)(ch + 3)).ToArray();
            return new string(a);
        }

        public static string UnScramble(string s)
        {
            var a = s.Select(ch => (char)(ch - 3)).ToArray();
            return new string(a);
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var doc = XDocument.Parse("<a><b>this</b><b><c>is</c><c><d>a test</d></c></b></a>");

            Scrambler.ScrambleTextNodes(doc);

            Console.WriteLine(doc.ToString());

            Scrambler.UnScrambleTextNodes(doc);

            Console.WriteLine(doc.ToString());

            Console.ReadLine();
        }

    }
}

Output:

<a>
  <b>wklv</b>
  <b>
    <c>lv</c>
    <c>
      <d>d whvw</d>
    </c>
  </b>
</a>
<a>
  <b>this</b>
  <b>
    <c>is</c>
    <c>
      <d>a test</d>
    </c>
  </b>
</a>

You can always use some other scrambling algorithm. The scrambling itself is a translation to C# of Kai's Python answer.

edit: clean-up :) edit2: removed the check to not scramble spaces. This would cause the unscrambling to be incorrect at times...

Ward Werbrouck