tags:

views:

39

answers:

2

What could be the easiest way to match all links and e-mail addresses in a string to a list array? I was using preg_match in PHP but in C# it looks like it will be way different.

+1  A: 

Assuming that you already have a working regular expression, you can use the Regex class, like this:

static readonly Regex linkFinder = new Regex(@"https?://[a-z0-9.]+/\S+|\s+@\S+\.\S+", RegexOptions.IgnoreCase);

foreach(Match match in linkFinder.Matches(someString)) {
    //Do things...
    string url = match.Value;
    int position = match.Index;
}
SLaks
fogot the ":" after https?
serhio
serhio
@serhio: `\S+` should match all that. I'm primarily trying to demonstrate how to use the regex.
SLaks
A: 

This should work for links:

https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?

Source

This should work for email addresses:

[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}

Source

npinti
-1: There are top level domains that "email regex" will fail to match (e.g. .museum TLD). And the domain should be lower case, so in fact it won't match any. Regex is the WRONG TOOL to find email addresses.
Richard
@Richard: Regexs are not the "wrong tool" to find emails. They are **exactly the right tool**. They are **wrong** tool to **parse** and **validate**, but finding strings is THE purpose of a regex.
John Gietzen
@John: for any short regex there will be valid email addresses it fails to find. (E.g. with the one in the Q, many O'Reillys will be disappointed.)
Richard