I'm mirroring some internal websites for backup purposes. As of right now I basically use this c# code:
System.Net.WebClient client = new System.Net.WebClient();
byte[] dl = client.DownloadData(url);
This just basically downloads the html and into a byte array. This is what I want. The problem however is that the links within the html are most of the time relative, not absolute.
I basically want to append whatever the full http://domain.is before the relative link as to convert it to an absolute link that will redirect to the original content. I'm basically just concerned with href= and src=. Is there a regex expression that will cover some of the basic cases?
Edit [My Attempt}:
public static string RelativeToAbsoluteURLS(string text, string absoluteUrl)
{
if (String.IsNullOrEmpty(text))
{
return text;
}
String value = Regex.Replace(text, "<(.*?)(src|href)=\"(?!http)(.*?)\"(.*?)>", "<$1$2=\"" + absoluteUrl + "$3\"$4>", RegexOptions.IgnoreCase | RegexOptions.Multiline);
return value.Replace(absoluteUrl + "/", absoluteUrl);
}