I have, for example, markup like this
<div id="content">
<p>Here is some wonderful text, and here is a <a href="#">link</a>. All links should have a `href` attribute.</p>
</div>
Now I want to be able to perform some regex replace on the text inside the p element, but not in any HTML, i.e. be able to match the href within backticks, but not inside the anchor element.
I thought about regex, but as the general consensus is, I shouldn't be using them to parse HTML.
My current method of doing this is like so: I've got a bunch of words in an array, and I am looping through them and making an object of data like so:
termsData[term] = {
regex: new RegExp('(\\b' + term + '\\b)', 'gmi'),
replaceWith: '<span>{TERM}</span>'
};
I then loop through it again, making the replacements like so:
var html = obj.html();
$.each(terms, function(i, term) {
// Replace each word in the HTML with the span
html = html.replace(termsData[term].regex, termsData[term].replaceWith.replace(/{TERM}/, '$1'));
});
obj.html(html);
Now I did a lot of this last night at an ungodly hour, and copying and pasting it into here seems to make think I should refactor some of this.
So from you should be able to tell, I want to be able to replace plain text, but not anything inside a HTML tag.
What would be the best way to do it?
Note: The source code is coming from here if you'd like a better look.