I have a need to perform search and replace operations on large blocks of HTML. I do not wish to change anything that is part of an html tag (like urls) - I also do not wish to change urls OUTSIDE of html tags. I have a partial solution for matching a word that is not inside of html (src):
word(?!([^<]+)?>)
while regex buddy also says that this will match the same:
(?!([^<]+)?>)word
so, the only thing left to do is ensure that word is not part of a string that looks like a url - like this:
(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|$!:,.;]*[A-Z0-9+&@#/%=~_|$]
I am unsure if this is possible, my intention is to preserve urls that are present in the text, and are part of the html of the content, while allowing search and replace operations on anything else:
The ideal solution would match DOG and replace with CAT as illustrated below
<h1>DOG</h1> -> <h1>CAT</h1>
<h1 class='DOG'>DOG</h1> -> <h1 class='DOG'>CAT</h1>
<p class='DOG'>DOG: http://www.DOG.com/DOGfood.html DOGfood is delicious.</p> -> <p class='DOG'>CAT: http://www.DOG.com/DOGfood.html CATfood is delicious.</p>
Bonus points for efficiency, I am nearly at my wits end.