tags:

views:

1665

answers:

4

I am wondering if it is possible to extract the index position in a given string where a Regex failed when trying to match it?

For example, if my regex was "abc" and I tried to match that with "abd" the match would fail at index 2.

Thanks

Edit for clarification. The reason I need this is to allow me to simplify the parsing component of my application. The application is an Assmebly language teaching tool which allows students to write, compile, and execute assembly like programs.

Currently I have a tokenizer class which converts input strings into Tokens using regex's. This works very well. For example:

The tokenizer would produce the following tokens given the following input = "INP :x:"
:

Token.OPCODE, Token.WHITESPACE, Token.LABEL, Token.EOL

These tokens are then analysed to ensure they conform to a syntax for a given statement. Currently this is done using IF statements and is proving cumbersome. The upside of this approach is that I can provide detailed error messages. I.E

if(token[2] != Token.LABEL) { throw new SyntaxError("Expected label"); }

I want to use a regular expression to define a syntax instead of the annoying IF statements. But in doing so I lose the ability to return detailed error reports. I therefore would at least like to inform the user of WHERE the error occured.

+1  A: 

I guess such an index would only have meaning in some simple case, like in your example.

If you'll take a regex like "ab*c*z" (where by * I mean any character) and a string "abbbcbbcdd", what should be the index, you are talking about? It will depend on the algorithm used for mathcing... Could fail on "abbbc..." or on "abbbcbbc..."

Yacoder
I'd want the first index going from left to right. In your example I believe "abbbcbbcdd" would match fine up until the point where the regex requires a 'z' character.
Richie_W
A: 

I don't believe it's possible, but I am intrigued why you would want it.

ColinYounger
I added a brief summary in my question about why. Cheers for your answer
Richie_W
+1  A: 

I agree with Colin Younger, I don't think it is possible with the existing Regex class. However, I think it is doable if you are willing to sweat a little:

  1. Get the Regex class source code (e.g. http://www.codeplex.com/NetMassDownloader to download the .Net source).
  2. Change the code to have a readonly property with the failure index.
  3. Make sure your code uses that Regex rather than Microsoft's.
torial
A: 

In order to do that you would need either callbacks embedded in the regex (which AFAIK C# doesn't support) or preferably hooks into the regex engine. Even then, it's not clear what result you would want if backtracking was involved.

Michael Carman