All,
I need to write a regular expression to perform the following operations replace
(A)
src ="/folder/image.jpg"
or
src="http://www.mydomain.com/folder/image.jpg"
with
src="/cache/getCacheItem.aspx?source_url=http://www.mydomain.com/folder/image.jpg"
(B)
href="/folder/file.zip"
or
href="http://www.mydomain.com/folder/file.zip"
with
href="/cache/getCaccheItem.aspx?source_url=http://www.mydomain.com/folder/file.zip
I know I can use
(src|href).*?=['|\"](?<url>.*?)['|\"]
with a replace value of
$1="/legacy_integration/cache/getCacheItem.aspx?source_url=$2"
to catch the src=... and href=... attributes. However, I need to filter based on file extension - only match valid image extensions like jpg, png, gif, and only match href extensions like zip and pdf.
Any suggestions? The problem can be summarized as: modify the above expression to match only certain file extensions, and allow the domain http://www.mydomain.com/ to be inserted only if the original url was a relative, thus ensuring that the output text contains the domain exactly once.
Do I need to perform this using two different regular expressions, one for source text including the domain and one without? Or can I somehow use a conditional match statement that, in combination with a replacement expression, will insert the domain or not based on whether the matched text contains the domain?
I know I can perform this using a custom match evaluator, but it seems that it may be faster/more efficient to do it within the regex itself.
Suggestions/comments?