views:

3263

answers:

7

I need C# string search algorithm which can match multiple occurance of pattern. For example, if pattern is 'AA' and string is 'BAAABBB' Regex produce match result Index = 1, but I need result Index = 1,2. Can I force Regex to give such result?

A: 

Any regular expression can give an array of MatchCollection

Dror
Would be nice, if you could paste some demo code for this.
BeowulfOF
This is why I added the link to MSDN...
Dror
+12  A: 

Use a lookahead pattern:-

"A(?=A)"

This finds any A that is followed by another A without consuming the following A. Hence AAA will match this pattern twice.

AnthonyWJones
+3  A: 

To summarize all previous comments:

Dim rx As Regex = New Regex("(?=AA)")
Dim mc As MatchCollection = rx.Matches("BAAABBB")

This will produce the result you are requesting.

EDIT:
Here is the C# version (working with VB.NET today so I accidentally continued with VB.NET).

Regex rx = new Regex("(?=AA)");
MatchCollection mc = rx.Matches("BAAABBB");
Sani Huttunen
A: 

Try this:

       System.Text.RegularExpressions.MatchCollection  matchCol;
       System.Text.RegularExpressions.Regex regX = new System.Text.RegularExpressions.Regex("(?=AA)");

        string index="",str="BAAABBB"; 
        matchCol = regX.Matches(str);
        foreach (System.Text.RegularExpressions.Match mat in matchCol)
            {
                index = index + mat.Index + ",";
            }

The contents of index are what you are looking for with the last comma removed.

Lonzo
A: 

pattern '(?=A)' gives good results but enormously exten calc time. I have a string with 20M characters and calc speed is very important. Does anyone has other solution? Thanks.

A: 

Are you really looking for substrings that are only two characters long? If so, searching a 20-million character string is going to be slow no matter what regex you use (or any non-regex technique, for that matter). If the search string is longer, the regex engine can employ a search algorithm like Boyer-Moore or Knuth-Morris-Pratt to speed up the search--the longer the better, in fact.

By the way, the kind of search you're talking about is called overlapping matches; I'll add that to the tags.

Alan Moore