tags:

views:

156

answers:

1

Hi guys,

I'm building a lexical analysis engine in c#. For the most part it is done and works quite well. One of the features of my lexer is that it allows any user to input their own regular expressions. This allows the engine to lex all sort of fun and interesting things and output a tokenised file.

One of the issues im having is I want the user to have everything contained in this tokenised file. I.E the parts they are looking for and the parts they are not (Partial Highlighting would be a good example of this).

Based on the way my lexer highlights I found the best way to do this would be to negate the regular expressions given by the user.

So if the user wanted to lex a string for every occurrence of "T" the negated version would find everything except "T".

Now the above is easy to do but what if a user supplies 8 different expressions of a complex nature, is there a way to put all these expressions into one and negate the lot?

+1  A: 

You could combine several RegEx's into 1 by using (pattern1)|(pattern1)|... To negate it you just check for !IsMatch

var matches = Regex.Matches("aa bb cc dd", @"(?<token>a{2})|(?<token>d{2})");

would return in fact 2 tokens (note that I've used the same name twice.. that's ok) Also explore Regex.Split. For instance:

var split = Regex.Split("aa bb cc dd", @"(?<token>aa bb)|(?:\s+)");

returns the words as tokens, except for "aa bb" which is returned as one token because I defined it as so with (?...).

You can also use the Index and Length properties to calculate the middle parts that have not been recognized by the Regex:

var matches = Regex.Matches("aa bb cc dd", @"(?<token>a{2})|(?<token>d{2})");
for (int i = 0; i < matches.Count; i++)
{
   var group = matches[i].Groups["token"];
   Console.WriteLine("Token={0}, Index={1}, Length={2}", group.Value, group.Index, group.Length);
}
Nestor
That's pretty good. +1 to you.
David Stratton
Would this work given that I am using named references to pull out details, of complex expressions?
DeanMc
DeanMc: yes.. it should work. I've added some examples in my response.
Nestor
I still don't see how you would negate, for instance you can add match results to a match object but where do you find the others, to add matches to a match object you can use regex.!ismatch()
DeanMc
I've added another example using Index and Length. Using these two properties you can know if there is a not-matched part of text since the last match. Those will be your "not matched" tokens.
Nestor
Aye, that is what I am currently using to pull the text that is missing. I thought if I could negate the expression I could make the code leaner but this seems to be the best way I can get, thanks for your time
DeanMc
DeanMac, then mark this question as answered. Thanks
Nestor
... I will grudgingly as its technically not resolved.
DeanMc
No, if you're not content with the answer(s) provided, there is no need to accept it.
Bart Kiers