views:

105

answers:

2

Processing time doubles as "Y" goes to the right. Can anybody tell me why? How to solve this problem?

I have many big ID's stored in a database those can't be changed so I can't limit the size too much.

using System;
using System.IO;
using System.Text;
using System.Xml;
using System.Xml.Schema;

namespace TestRegex
{
 class Program
 {
  static void Main(string[] args)
  {

   DateTime start = DateTime.Now;

   /******************************************
    *  ID to validate
    ******************************************/
   //string id = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"; // Ok: Fast
     string id = "xxxxxxxxxxxxxxxxxxxxxYxxxxxxx"; // Invalid: Slow
   //string id = "xxxxxxxxxxxxxxxxxxxxxxYxxxxxx"; // Invalid: Slower
   //string id = "xxxxxxxxxxxxxxxxxxxxxxxYxxxxx"; // Invalid: Very slow
   //string id = "xxxxxxxxxxxxxxxxxxxxxxxxYxxxx"; // Invalid: Very very slow

   /******************************************
    *  XML to validate
    ******************************************/  
   XmlDocument doc = new XmlDocument();
   doc.LoadXml("<root id='" + id + "'></root>");

   /******************************************
    *  XSD validator
    ******************************************/
   string xsl =
@"
<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema'
           elementFormDefault='unqualified'
           attributeFormDefault='unqualified'>

 <xs:simpleType name='id'>
        <xs:restriction base='xs:string'>
            <xs:pattern value='^([a-z_]+[0-9]*)+' />
        </xs:restriction>
 </xs:simpleType>

    <xs:element name='root'>
        <xs:complexType>
            <xs:attribute name='id' use='required' type='id' />
  </xs:complexType>
 </xs:element>
</xs:schema>
";

   /******************************************
    *  Adds XSD to XML and validates it
    ******************************************/
   XmlTextReader reader = new XmlTextReader(
    new MemoryStream(ASCIIEncoding.Default.GetBytes(xsl)));

   XmlSchema schema = XmlSchema.Read(reader, new ValidationEventHandler(Validate));
   doc.Schemas.Add(schema);
   doc.Validate(new ValidationEventHandler(Validate));


   /******************************************
    *  Performance results
    ******************************************/
   Console.WriteLine(id.Length + " = " + (DateTime.Now - start).TotalSeconds);
   Console.Read();
  }

  private static void Validate(object o, ValidationEventArgs args)
  {
   if (args.Exception != null)
   {
    Console.WriteLine(args.Exception);
   }
  }
 }
}
+2  A: 

This looks like a case of a Catastrophic Backtracking.
Your regex seems overly complex. If I'm reading it correctly it accepts lower case and numbers, when the first letter isn't a number. You can rewrite it as:

^[a-z_]\w*
Kobi
Thanks but your regex doesn't have the same behavior as mine.
Eduardo
Right! sorry, I forgot you where looking for lower case letters...
Kobi
+2  A: 

Solved!

The regex ^([a-z_][a-z_0-9]*) has the same behavior and it's extremely faster.

Eduardo