What could be the easiest way to match all links and e-mail addresses in a string to a list array? I was using preg_match
in PHP but in C# it looks like it will be way different.
views:
39answers:
2
+1
A:
Assuming that you already have a working regular expression, you can use the Regex
class, like this:
static readonly Regex linkFinder = new Regex(@"https?://[a-z0-9.]+/\S+|\s+@\S+\.\S+", RegexOptions.IgnoreCase);
foreach(Match match in linkFinder.Matches(someString)) {
//Do things...
string url = match.Value;
int position = match.Index;
}
SLaks
2010-06-09 14:04:43
fogot the ":" after https?
serhio
2010-06-09 14:07:44
serhio
2010-06-09 14:14:29
@serhio: `\S+` should match all that. I'm primarily trying to demonstrate how to use the regex.
SLaks
2010-06-09 14:16:58
A:
This should work for links:
https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?
This should work for email addresses:
[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}
npinti
2010-06-09 14:08:58
-1: There are top level domains that "email regex" will fail to match (e.g. .museum TLD). And the domain should be lower case, so in fact it won't match any. Regex is the WRONG TOOL to find email addresses.
Richard
2010-06-09 14:11:55
@Richard: Regexs are not the "wrong tool" to find emails. They are **exactly the right tool**. They are **wrong** tool to **parse** and **validate**, but finding strings is THE purpose of a regex.
John Gietzen
2010-06-09 14:16:09
@John: for any short regex there will be valid email addresses it fails to find. (E.g. with the one in the Q, many O'Reillys will be disappointed.)
Richard
2010-06-10 10:53:02