I'm generating regular expressions dynamically by running through some xml structure and building up the statement as I shoot through its node types. I'm using this regular expression as part of a Layout type that I defined. I then parse through a text file that has an Id in the beginning of each line. This id points me to a specific layout. I then try to match the data in that row against its regex.
Sounds fine and dandy right? The only problem is it is matching strings extremely slow. I have them set as compiled to try and speed things up a bit, but to no avail. What is baffling is that these expressions aren't that complex. I am by no means a RegEx guru, but I know a decent amount about them to get things going well.
Here is the code that generates the expressions...
StringBuilder sb = new StringBuilder();
//get layout id and memberkey in there...
sb.Append(@"^([0-9]+)[ \t]{1,2}([0-9]+)");
foreach (ColumnDef c in columns)
{
sb.Append(@"[ \t]{1,2}");
switch (c.Variable.PrimType)
{
case PrimitiveType.BIT:
sb.Append("(0|1)");
break;
case PrimitiveType.DATE:
sb.Append(@"([0-9]{2}/[0-9]{2}/[0-9]{4})");
break;
case PrimitiveType.FLOAT:
sb.Append(@"([-+]?[0-9]*\.?[0-9]+)");
break;
case PrimitiveType.INTEGER:
sb.Append(@"([0-9]+)");
break;
case PrimitiveType.STRING:
sb.Append(@"([a-zA-Z0-9]*)");
break;
}
}
sb.Append("$");
_pattern = new Regex(sb.ToString(), RegexOptions.Compiled);
The actual slow part...
public System.Text.RegularExpressions.Match Match(string input)
{
if (input == null)
throw new ArgumentNullException("input");
return _pattern.Match(input);
}
A typical "_pattern" may have about 40-50 columns. I'll save from pasting the entire pattern. I try to group each case so that I can enumerate over each case in the Match object later on.
Any tips or modifications that could drastically help? Or is this running slowly to be expected?
EDIT FOR CLARITY: Sorry, I don't think I was clear enough the first time around.
I use an xml file to generate regex's for a specific layout. I then run through a file for a data import. I need to make sure that each line in the file matches the pattern it says its supposed to be. So, patterns could be checked against multiple times, possible thousands.