views:

49

answers:

2

I'm looking for a regular expression library in .Net that supports lazy evaluation.

Note: I'm specifically looking for lazy evaluation (i.e., the library, instead of immediately returning all matches in a document, only consumes as much of the document as necessary to determine the next match per request), NOT support for lazy quantifiers - though if it also supports lazy quantifiers, I wouldn't object!

Specific details: I want to be able to run regexes against very large documents with potentially hundreds of thousands of regex matches, and iterate across the results using IEnumerable<> semantics, without having to take the up-front cost of finding all matches.

Ideally FOSS in C#, but the only requirement is usability from a .Net 3.5 app.

+2  A: 

Are you sure the built-in Regex class doesn't do this? For example, the Match.NextMatch() method would suggest that it's continuing from where it got to...

I believe that if you call Regex.Match it will stop at the first match it comes to, and then continue from there when you call NextMatch.

Jon Skeet
+3  A: 

The Match class' NextMatch method should meet your needs:

Returns a new Match with the results for the next match, starting at the position at which the last match ended (at the character after the last matched character).

A quick look at it in Reflector confirms this behavior:

public Match NextMatch()
{
    if (this._regex == null)
    {
        return this;
    }
    return this._regex.Run(false, base._length, base._text, this._textbeg,
        this._textend - this._textbeg, this._textpos);
}

Check out the linked MSDN reference for an example of its usage. Briefly, the flow would resemble:

Match m = rx.Match(input);
while (m.Success) 
{
    // do work
    m = m.NextMatch();
}
Ahmad Mageed
Thanks! All the code samples I've ever seen for the framework classes used Regex.Matches, not not Regex.Match, so I didn't even realize this existed. I feel dumb, but at least it's an easy solution. (c:
Dathan