I need C# string search algorithm which can match multiple occurance of pattern. For example, if pattern is 'AA' and string is 'BAAABBB' Regex produce match result Index = 1, but I need result Index = 1,2. Can I force Regex to give such result?
Regex.Matches() http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.matchcollection.aspx
Use a lookahead pattern:-
"A(?=A)"
This finds any A that is followed by another A without consuming the following A. Hence AAA will match this pattern twice.
To summarize all previous comments:
Dim rx As Regex = New Regex("(?=AA)")
Dim mc As MatchCollection = rx.Matches("BAAABBB")
This will produce the result you are requesting.
EDIT:
Here is the C# version (working with VB.NET today so I accidentally continued with VB.NET).
Regex rx = new Regex("(?=AA)");
MatchCollection mc = rx.Matches("BAAABBB");
Try this:
System.Text.RegularExpressions.MatchCollection matchCol;
System.Text.RegularExpressions.Regex regX = new System.Text.RegularExpressions.Regex("(?=AA)");
string index="",str="BAAABBB";
matchCol = regX.Matches(str);
foreach (System.Text.RegularExpressions.Match mat in matchCol)
{
index = index + mat.Index + ",";
}
The contents of index are what you are looking for with the last comma removed.
Are you really looking for substrings that are only two characters long? If so, searching a 20-million character string is going to be slow no matter what regex you use (or any non-regex technique, for that matter). If the search string is longer, the regex engine can employ a search algorithm like Boyer-Moore or Knuth-Morris-Pratt to speed up the search--the longer the better, in fact.
By the way, the kind of search you're talking about is called overlapping matches; I'll add that to the tags.