Got a simple task to get a XPath expression and return a prefix that matches the parent of the node that (might be) selected.
Example:
/aaa/bbb => /aaa
/aaa/bbb/ccc => /aaa/bbb
/aaa/bbb/ccc[@x='1' and @y="/aaa[name='z']"] => /aaa/bbb
Because the patterns inside the square brackets might contain brackets within quotes, I decided to try to achieve this with the use of regular expressions. Here's a code snippet:
string input =
"/aaa/bbb/ccc[@x='1' and @y=\"/aaa[name='z'] \"]";
// ^-- remove space for no loop
string pattern = @"/[a-zA-Z0-9]+(\[([^]]*(]"")?)+])?$";
System.Text.RegularExpressions.Regex re =
new System.Text.RegularExpressions.Regex(pattern);
bool ismatch = re.IsMatch(input); // <== Infinite loop in here
// some code based on the match
Because the patterns are rather regular, I looked for '/' followed by indentifier followed by an optional group that matches at the end of the string (....)?$
The code seemd to work but playing with different values for the input string, I found that by simply inserting a space (in the location shown in the comment), the .NET IsMatch function gets into an infinite loop, taking all the CPU it gets.
Now regardless of whether this regular expression pattern is the best one (I had more complex but simplified it to show the problem), this seems to show that using RegEx with anything not trivial may be very risky.
Am I missing something? Is there a way to guard against infinite loops in regular expression matches?