Try the following:
string[] inputs = { "href=\"http://yahoo.com/media/news.html\"", ">http://yahoo.com/media/news.html<" };
string pattern = @"(?:href=""|>)(?<Url>http://.+?)[<""]";
foreach (string input in inputs)
{
Match m = Regex.Match(input, pattern);
if (m.Success)
{
Console.WriteLine(m.Groups["Url"].Value);
}
}
EDIT: another approach is to use look-arounds so that the text is matched but not captured. This allows you to use Match.Value
directly instead of using groups. Try this alternate approach below.
string pattern = @"(?<=href=""|>)http://.+?(?=<|"")";
foreach (string input in inputs)
{
Match m = Regex.Match(input, pattern);
if (m.Success)
{
Console.WriteLine(m.Value);
}
}
EDIT #2: per the request in the comments here is a pattern that will not match URLs that contain "..." in the text.
string pattern = @"(?<=href=""|>)http://(?!.*\.{3}).+?(?=<|"")";
The only change is the addition of (?!.*\.{3})
which is a negative look-ahead that allows the pattern to match if the specified suffix is absent. In this case it checks that the "..." is absent. If you need to match at least 3 dots then use {3,}
.