views:

74

answers:

2

Hello everyone,

I am checking againtst whether a specific input string are valid (could be used as the value for an XML element) in XML UTF-8 encoding. My goal is to tell which string (from an input string array) is not valid according to XML UTF-8 encoding standard.

Here is my code, my current implementation is straightforward -- assemble XML file with each individual string from the input string array. I am not sure whether it is the most efficient way. From functional point of view, it works.

My working environment is .Net 3.5 + VSTS 2008 + C#.

    static void Main(string[] args)
    {
        string[] inputs = { "Hello", "World", "StackOverflow", "ServerFault", "&#DFFE" };
        XmlDocument xDoc = new XmlDocument();
        string header = "<?xml version=\"1.0\" encoding=\"utf-8\"?>";
        string formatter = "<foo>{0}</foo>";
        foreach (string item in inputs)
        {
            StringBuilder builder = new StringBuilder();
            builder.Append (header);
            builder.Append (String.Format(formatter, item));
            try
            {
                xDoc.Load(new StringReader(builder.ToString()));
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.ToString());
            }
        }
    }

thanks in advance, George

+1  A: 

George, when writing tests, it's best to start with a test that demonstrates the failure case.

Will your code ever fail? I don't think so.

You should start with a test that is close to the problem that made you want to create this test. I presume you had a problem with an XML file not being properly encoded? In that case, you should create a test that proves that the bad file is bad (which you already know to be true), then generalize the test so that it can detect other bad files as being bad, and all good files as being good.

John Saunders
Hi John, I understand what kinds of inputs are invalid, my quesiton is not about how to test, but whether my current straightforward implementation to check string one by one (in the way of constructing an XML file) is efficient enough, and whether better ways to check more efficiently? Let me know if there are any misunderstandings. I am not asking function, but performance.
George2
"Will your code ever fail? I don't think so." -- my code will fail when there are invalid input, for example, xxx, when xxxx is not in valid range of XML. http://en.wikipedia.org/wiki/XML
George2
Refers to the "Numeric character references" section of the wikipedia page.
George2
George, if there are failure cases, then maybe you should edit your question to include one of them, instead of showing a list of success cases?
John Saunders
Diadistis
@John, I have added invalid test case and you can test that it will cause Exception. Now my turn, :-) any advice about how to improve validation performance?
George2
@Diadistis, any advice about how to improve validation performance? Currently, I check all strings one by one...
George2
+1  A: 

You could do something like this :

    public static XmlElement xmlValidationElement =
        new XmlDocument().CreateElement("validator");

    static void Main(string[] args)
    {
        string[] inputs = { "Hello", "World", "StackOverflow", "ServerFault" };
        foreach (string item in inputs)
        {
            try
            {
                xmlValidationElement.InnerXml = item;
            }
            catch (XmlException ex)
            {
                Console.WriteLine(ex.ToString());
            }
        }
    }
Diadistis
Your code performance is better than mine? And why?
George2
Yes, because the only step involved in my code is the actual built-in validation. No StringBuilder, no StringReader, a single XmlDocument, and a single XmlElement without xml parsing. You can run some tests to see the actual difference ;)
Diadistis