tags:

views:

136

answers:

4

What's the regular expression to check if a string starts with "mailto" or "ftp" or "joe" or...

Now I am using C# and code like this in a big if with many ors:

String.StartsWith("mailto:")
String.StartsWith("ftp")

It looks like a regex would be better for this. Or is there a C# way I am missing here?

+6  A: 

You could use:

^(mailto|ftp|joe)

But to be honest, StartsWith is perfectly fine to here. You could rewrite it as follows:

string[] prefixes = { "http", "mailto", "joe" };
string s = "joe:bloggs";
bool result = prefixes.Any(prefix => s.StartsWith(prefix));

You could also look at the System.Uri class if you are parsing URIs.

Mark Byers
+2  A: 

The following will match on any string that starts with mailto, ftp or http:

 RegEx reg = new RegEx("^(mailto|ftp|http)");

To break it down:

  • ^ matches start of line
  • (mailto|ftp|http) matches any of the items separated by a |

I would find StartsWith to be more readable in this case.

Oded
I think you missed a `)` in your regular expression.
Mark Byers
@Mark Byers - thanks.
Oded
+1 for the update :)
Mark Byers
A: 

The StartsWith method will be faster, as there is no overhead of interpreting a regular expression, but here is how you do it:

if (Regex.IsMatch(theString, "^(mailto|ftp|joe):")) ...

The ^ mathes the start of the string. You can put any protocols between the parentheses separated by | characters.

edit:

Another approach that is much faster, is to get the start of the string and use in a switch. The switch sets up a hash table with the strings, so it's faster than comparing all the strings:

int index = theString.IndexOf(':');
if (index != -1) {
  switch (theString.Substring(0, index)) {
    case "mailto":
    case "ftp":
    case "joe":
      // do something
      break;
  }
}
Guffa
-1, The more strings you add, the more likely the regex is going to be faster I think, since the regex engine will be able to find a match using a single pass of the source string (or less than a pass, if there is no match). Anyways, I don't think speed is going to be an issue with either method.
tster
@tster: You are forgetting the short circuit of a condition. If there is a match the `StartsWith` solution by average only have to check half of the strings before it finds the match, or less if they are arranged according to rate of occurance.
Guffa
That's true, but the speed argument is still baloney. You could always Compile the regular expression the first time you use it, and then that overhead goes away.
tster
A: 

I really recommend using the String.StartsWith method over the Regex.IsMatch if you only plan to check the beginning of a string.

  • Firstly, the regular expression in C# is a language in a language with does not help understanding and code maintenance. Regular expression is a kind of DSL.
  • Secondly, many developers does not understand regular expressions: it is something which is not understandable for many humans.
  • Thirdly, the StartsWith method brings you features to enable culture dependant comparison which regular expressions are not aware of.

In your case you should use regular expressions only if you plan implementing more complex string comparison in the future.

Ucodia