views:

149

answers:

3

I've converted html to a string, I'm able to use replace in that string to wrap the text with a link, and I can put that html back into the ID it came from.

My problem is that my replace method is going inside existing links on the page. This could create nested links, which is a problem. Does anyone out there know how to prevent the replace method from matching text that is within a link already?

I have right now:

keyword = "matching phrase";
keywordLink = "<a href='http://myurl.com'/&gt;" + keyword + "</a>";
sasser = sasser.replace(keyword, keywordLink);
sasDom.innerHTML = sasser;

I'm looking for, in pseudo code:

... (keyword [if the next " < " sign is not followed by "/a>", regardless of how far away it is], keywordLink);
+1  A: 

If you don't mind using JQuery, you can employ its wrap() function to wrap text or html elements in the specified tags.

Soviut
Does that work for wrapping only part of the text inside some tags, too?
Franz
Thinking about it, it will probably not keep you from having nested link tags, or will it?
Franz
A: 

I would do it in three steps:

1) replace <a [^>]+>matching phrase</a> with $1some_other_phrase</a>

2) replace matching phrase with <a...>keyword</a>

3) replace some_other_phrase with matching phrase

yu_sha
+2  A: 

You can't do this kind of thing with regex at all. Work on the document objects which are already nicely parsed into a structure for you.

Here's a keyword linker adapted from this question.

// Find text in descendents of an element, in reverse document order
// pattern must be a regexp with global flag
//
function findTextExceptInLinks(element, pattern, callback) {
    for (var childi= element.childNodes.length; childi-->0;) {
        var child= element.childNodes[childi];
        if (child.nodeType===1) {
            if (child.tagName.toLowerCase()!=='a')
                findTextExceptInLinks(child, pattern, callback);
        } else if (child.nodeType===3) {
            var matches= [];
            var match;
            while (match= pattern.exec(child.data))
                matches.push(match);
            for (var i= matches.length; i-->0;)
                callback.call(window, child, matches[i]);
        }
    }
}

findTextExceptInLinks(document.body, /\bmatching phrase\b/g, function(node, match) {
    node.splitText(match.index+match[0].length);
    var a= document.createElement('a');
    a.href= 'http://www.example.com/myurl';
    a.appendChild(node.splitText(match.index));
    node.parentNode.insertBefore(a, node.nextSibling);
});

eta re comments: Here's a version of the same thing using plain text matching rather than regex:

function findPlainTextExceptInLinks(element, substring, callback) {
    for (var childi= element.childNodes.length; childi-->0;) {
        var child= element.childNodes[childi];
        if (child.nodeType===1) {
            if (child.tagName.toLowerCase()!=='a')
                findPlainTextExceptInLinks(child, substring, callback);
        } else if (child.nodeType===3) {
            var index= child.data.length;
            while (true) {
                index= child.data.lastIndexOf(substring, index);
                if (index===-1)
                    break;
                callback.call(window, child, index)
            }
        }
    }
}

var substring= 'matching phrase';
findPlainTextExceptInLinks(document.body, substring, function(node, index) {
    node.splitText(index+substring.length);
    var a= document.createElement('a');
    a.href= 'http://www.example.com/myurl';
    a.appendChild(node.splitText(index));
    node.parentNode.insertBefore(a, node.nextSibling);
});
bobince
Bobince - thank you so much! This solution is fantastic, and I greatly appreciate your taking the time to post it.I'm having one relatively simple problem now... I can't seem to use a variable as the matching phrase. Here's what I'm trying: findTextExceptInLinks(document.body, "/\b" + variable + "\b/g", function(node, match) {
Matrym
findTextExceptInLinks(document.body, "/\b" + variable + "\b/g", function(node, match) {
Matrym
@google: try passing `new RegExp('\\b' + variable + '\\b', 'g')`
Crescent Fresh
Yep, the `RegExp` constructor as posted by Crescent will work, but watch out: if your variable contains characters that are special to regex like `.` or `*` (most punctuation really), this won't match the literal versions of the string. If you want to match literal strings rather than words-at-boundaries, it'd be better to dump the regular expression matching and replace with `string.indexOf`.
bobince
To be clear, you're suggesting I change: ... while (match= pattern.exec(child.data)) ... to ... while (match= string.indexOf(child.data))
Matrym
ah, no, there'd be a few more changes as the string matching interface is a bit different... actually, a bit easier. Added plain-text version to answer.
bobince
You're a champ bobince. Thanks again.
Matrym