I'm not aware of anything built in and from your question it's a little bit ambiguous what you're looking for exactly. Do you want the entire anchor tag, or just the URL from the href attribute?
If you have well-formed XHtml, you might be able to get away with using an XmlReader and an XPath query to find all the anchor tags (<a>
) and then hit the href attribute for the address. Since that's unlikely, you're probably better off using RegEx to pull down what you want.
Using RegEx, you could do something like:
List<Uri> findUris(string message)
{
string anchorPattern = "<a[\\s]+[^>]*?href[\\s]?=[\\s\\\"\']+(?<href>.*?)[\\\"\\']+.*?>(?<fileName>[^<]+|.*?)?<\\/a>";
MatchCollection matches = Regex.Matches(message, anchorPattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Compiled);
if (matches.Count > 0)
{
List<Uri> uris = new List<Uri>();
foreach (Match m in matches)
{
string url = m.Groups["url"].Value;
Uri testUri = null;
if (Uri.TryCreate(url, UriKind.RelativeOrAbsolute, out testUri))
{
uris.Add(testUri);
}
}
return uris;
}
return null;
}
Note that I'd want to check the href to make sure that the address actually makes sense as a valid Uri. You can eliminate that if you aren't actually going to be pursuing the link anywhere.