views:

823

answers:

5

Hi,

Suppose I had the string "1 AND 2 AND 3 OR 4", and want to create an array of strings that contains all substrings "AND" or "OR", in order, found within the string.

So the above string would return a string array of {"AND", "AND", "OR"}.

What would be a smart way of writing that?

EDIT: Using C# 2.0+,

string rule = "1 AND 2 AND 3 OR 4";
string pattern = "(AND|OR)";
string[] conditions = Regex.Split(rule, pattern);

gives me {"1", "AND", "2", "AND", "3", "OR", "4"}, which isn't quite what I'm after. How can I reduce that to the ANDs and ORs only?

+1  A: 

Your probably looking for a tokeniser or Lexer, have a look at the following article:

C# Regular Expression Recipes—A Better Tokenizer

Student for Life
+1  A: 

This regex (.NET) seems to do what you want. You're looking for the matches (multiple) in the group at index=1:

.*?((AND)|(OR))*.*?

EDIT I've tested the following and it seems to do what you want. It's more lines than i would like but it approaches the task in a purely regex fashion (which IMHO is what you should be doing):

        string text = "1 AND 2 AND 3 OR 4";
        string pattern = @"AND|OR";

        Regex r = new Regex(pattern, RegexOptions.IgnoreCase);

        Match m = r.Match(text);
        ArrayList results = new ArrayList();
        while (m.Success)
        {
            results.Add(m.Groups[0].Value);

            m = m.NextMatch();
        }

        string[] matchesStringArray = (string[])results.ToArray(typeof(string));
cottsak
Why not just "(AND|OR)" ?
cdmckay
*shrugs* Maybe I've over complicated it.
cottsak
In C# 2.0+, using "AND|OR" as the pattern gives me more than just the ANDs and ORs - how can I get limit the pattern to give me only the ANDs and ORs? I've edited the question above.
David Hodgson
It seems the only way to get the regex engine to move onto the next match (of "AND|OR") is to invoke the .NextMatch() method. This sux coz now u have to iterate. But it seems you were never going to escape using a loop of some kind. Hope this is ok.
cottsak
You may use Regex.Matches to get all the results in one call... but as you said, you'll have to iterate on the result collection... or use Linq to get what you want !
Cédric Rup
Its kool you said that, coz i was thinking of using LINQ to filter out the parts of the dirty collection too. I just think that in this case you should make the best of one technology (if you will) rather than using half of two. In this case, if the regex can do it then i think it should. That being said, if you can use the regex/linq/string_functions as a combination to get the same result but in less (cleaner) lines of code, then +10 - do it that way. ;)
cottsak
+1  A: 

Since you know the exact substring you're looking for... why not just use IndexOf(substr, iOffset) to know the number of occurances (loop till it returns -1) ??

Depending on the complexity of your task, it could be simpler/faster than using regular expressions (since you're not matching patterns).

Gishu
A: 

Here's a goofy way that I came up with:

string rule = "1 AND 2 AND 3 OR 4";
List<string> andsOrs = new List<string>();
string[] split = rule.Split();
for (int i = 0; i < split.Length; i++)
{
   if (split[i] == "AND" || split[i] == "OR")
   {
       andsOrs.Add(split[i]);
   }
}
string[] conditions = andsOrs.ToArray();
return conditions;
David Hodgson
+1  A: 
string rule = "1 AND 2 AND 3 OR 4";
string pattern = "(AND|OR)";
MatchCollection conditions = Regex.Matches(rule, pattern);

Use Match.Value to get the string.

Nick Whaley