tags:

views:

67

answers:

5

Hi,

I am trying to filter out some text based on regex like phone* means i want the text "Phone booth", "phone cube" etc.

But when I give booth* it selects Phone booth also. It should not select it rite? Here is the code,

string[] names = { "phone booth", "hall way", "parking lot", "front door", "hotel lobby" };

        string input = "booth.*, door.*";
        string[] patterns = input.Split(new char[] { ',' });
        List<string> filtered = new List<string>();

        foreach (string pattern in patterns)
        {
            Regex ex = null;
            try
            {
                ex = new Regex(pattern.Trim());
            }
            catch { }
            if (ex == null) continue;

            foreach (string name in names)
            {
                if (ex.IsMatch(name) && !filtered.Contains(name)) filtered.Add(name);
            }
        }

        foreach (string filteredName in filtered)
        {
            MessageBox.Show(filteredName);
        }

It displays "Phone booth" and "front door". But as per my criteria, it should not show anything, bcoz no string is starting with booth or door.

Is any problem in my regex?

+5  A: 

If you want to match at the beginning of a string start with ^

So, for example if you wanted a match to start with phone, then contain characters after that, you could do the following

^phone.*

The ^ anchors the match to the start of the string.

Mitchel Sellers
+3  A: 

The problem is that you are not specifying that the string must start with booth or door, simply that the string must contain booth or door followed by a string of zero-length or greater.

If however, you change your Regex to be ^booth.* and ^door.*, everything should work.

Caret ( ^ ) it should be noted, means "The beginning of the line / string" (depending on whether or not your regular expression is in multiline mode -- i.e. if . will match newline characters.)

Sean Vieira
@Sean: Pedantic note: `^` is generally called a caret when talking about the ascii character. Circumflex is used when speaking of the diacritic mark (i.e. when you stick a ˆ on top of a letter). Note that ˆ and ^ are different symbols, too.
Brian
@Brian -- pedantic note noted and applied :-D
Sean Vieira
+1  A: 

Yes, you should prefix your patterns with "^", like so:

string input = "^booth.*, ^door.*";

This will tell C# you want only what's starting with "booth" or "door". More info here: http://oreilly.com/windows/archive/csharp-regular-expressions.html

Ioannis Karadimas
+1  A: 

You need to specify the start of the string in your regex if you don't want "phone booth" to match.

Example:

^booth.*

will match "booth" but not "phone booth".

booth.*

Will match any string that has "booth" in it.

Abe Miessler
Typo in your second example?
Steve Townsend
Doh! Thanks for the heads up.
Abe Miessler
A: 

Your Regex does not specify that the location of the matching string in pattern is location-constrained. If you want to ensure that you only match initial substrings, you have to specify '^' as the first part of the pattern.

See http://msdn.microsoft.com/en-us/library/az24scfc.aspx for more details.

Steve Townsend