URL-matching with regular expressions is extremely difficult, if not impossible. Unless you have some extra constraints on what your URLs in your documents contain, in which case you can sacrifice flexibility of your regex in exchange for practicality.
Since I already have this handy, this should grab the URL itself:
(?<=src=")[^"]+(?=")
Verified in Regex Hero, this regular expression uses a positive lookbehind and a positive lookahead to grab the url inside of src="".
I'll see if I can come up with something more specific to your task...
OK, this should work:
(?<=src=")[^"]+(/[^/]+(\.jpg|\.gif))(?=")
And then you can use a replacement value of:
/LocalDirectory/images$1
Or here's the complete C# code:
string strRegex = "(?<=src=\")[^\"]+(/[^/]+(\.jpg|\.gif))(?=\")";
RegexOptions myRegexOptions = RegexOptions.None;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = "<img src=\"http://www.example.com/any/number/of/directories/picture.jpg\" />" & vbCrLf & "<img src=\"http://www.example.com/any/number/of/directories/picture.gif\" />";
string strReplace = "/LocalDirectory/images$1";
return myRegex.Replace(strTargetString, strReplace);
strTargetString = "img tags to check";
string strRegex = "src=\"(.*)/(.*)\.(jpg|png|gif)\"";
RegexOptions myRegexOptions = RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strReplace = "src="\/LocalDirectory\/images\/$2\.$3"";
return myRegex.Replace(strTargetString, strReplace);
Misread the question. This will now replace the first part of the path for jpg, png and gif and keep the filename. anything else is ignored
Hope this helps:
var replace = "/localserver/some/directory/";
var strs = new List<string>
{
"<img src=\"http://www.example.com/any/number/of/directories/picture.jpg\"",
"<img src=\"http://www.example.com/any/number/of/directories/picture.gif\""
};
Regex r = new Regex("[^<img src=\"].*/");
foreach (var s in strs)
{
Console.WriteLine("Replaced: {0}",r.Replace(s,replace));
}
outputs:
Replaced: <img src="/localserver/some/directory/picture.jpg"
Replaced: <img src="/localserver/some/directory/picture.gif"
Try this out...
var test1 = "<img src=\"http://www.something.com/any/number/of/pic.jpg\">";
var test2 = "<img src=\"http://www.something.com/any/number/of/pic.doc\">";
var test3 = "<a href=\"http://www.something.com/any/number/of/pic.jpg\">";
var test4 = "<a href=\"http://www.something.com/any/number/of/pic.doc\">";
var reg = "<(?:a|img)\\s+(?:src|href)=\"(?<replace>http://www.+?/).+?(?:\\.jpg|\\.jpeg|\\.gif|\\.png)\".*?>";
var file = test1 + "\r\n" + test2 + "\r\n" + test3 + "\r\n" + test4;
var results = Regex.Matches(file, reg);
for (int i = results.Count - 1; i >= 0; i--)
{
var match = results[i];
var group = match.Groups["replace"];
file = file.Remove(group.Index, group.Length);
file = file.Insert(group.Index, "/LocalDirectory/");
}
Console.WriteLine(file);
Console.ReadKey();
So the regex string I am using here is:
<(?:a|img)\s+(?:src|href)=\"(?http://www.+?/).+?(?:\.jpg|\.jpeg|\.gif|\.png)\".*?>
This will match only anchor links and img tags and only jpg, jpeg, gif, and png files
Part by part here is how this works:
< - matches the opening tag
(?:a|img) - specifies that only an anchor or img tag should be looked at
\s+ - require 1 or more spaces
(?:src|href) - match only a src or href
=\" - immediately followed by an equal sign and quotation mark
(?http://www.+?/) - Here we are grabbing what we need to replace - it must start with "http://www" and it will capture everything up to the next slash (/)
.+? - bla bla bla until the file extension is found
(?:\.jpg:\.gif|\.jpeg|\.png) - must have one of these extensions \".*?> - This patterm ends in a closing bracket and allows for whatever parameters etc in between.
Then all I am doing is going through each match - grabbing the group named "replace" and removing/inserting from the file at that group's index
Make sure you do this in reverse order so your replaces are not throwing off your group indexes
I "think" that should do it - please let me know if I had any over-sights.