I had to do this a couple of months ago. Originally someone used string manipulation of innerHTML as suggested by others here but that road leads to madness. The troublesome corner cases are: What if the keyword being marked up is an element's class name or id. We also don't want to mangle any embedded javascript and accidentally cause invalid markup. Also need to handle embedded css styles. In the end you end up writing a text parser for HTML4.0 + ECMAscript + css which, even in the limited case, is a lot of work - it's writing your own web browser - IN JAVASCRIPT!
But we already have a HTML+JS+CSS parser in the web browser that generates a nice DOM tree for us. So I decided to use that. This is what I came up with in the end:
keywords = ['hello world','goodbye cruel world'];
function replaceKeywords (domNode) {
if (domNode.nodeType === Node.ELEMENT_NODE) { // We only want to scan html elements
var children = domNode.childNodes;
for (var i=0;i<children.length;i++) {
var child = children[i];
// Filter out unwanted nodes to speed up processing.
// For example, you can ignore 'SCRIPT' nodes etc.
if (child.nodeName != 'EM') {
replaceKeywords(child); // Recurse!
}
}
}
else if (domNode.nodeType === Node.TEXT_NODE) { // Process text nodes
var text = domNode.nodeValue;
// This is another place where it might be prudent to add filters
for (var i=0;i<keywords.length;i++) {
var match = text.indexOf(keywords[i]); // you may use search instead
if (match != -1) {
// create the EM node:
var em = document.createElement('EM');
// split text into 3 parts: before, mid and after
var mid = domNode.splitText(match);
mid.splitText(keywords[i].length);
// then assign mid part to EM
mid.parentNode.insertBefore(em,mid);
mid.parentNode.removeChild(mid);
em.appendChild(mid);
}
}
}
}
As long as you're careful about filtering out things you don't want to process (for example, empty text nodes that contain only whitespace - there are lots of them, trust me) this is very fast.
Additional filtering not implemented for algorithmic clarity - left as an exercise to the reader.