views:

78

answers:

6

I'm trying to replace all instances of a word, say "foo" between some HTML tags.

<span id=foo> blah blah foo blah foo blah </span>

I want to replace all instances of foo that are not in the tag with bar, so the end result is:

<span id=foo> blah blah bar blah bar blah </span>

Notice that "foo" in the span tag has not been replaced.

I can manage to get the first (or last) occurance of "foo" replaced with my regular expression, but not multiple instances. Is this a situation where I should give up and not attempt to parse this with a regular expression?

Here is the regular expression that sort of works:

RegExp('(>[\\w\\s]*)\\bfoo\\b([\\w\\s]*<)',"ig"

or without javascript syntax:

s/>([\w\s]*)\bfoo\b([\w\s]*<)/

this syntax allows me to match (or should) match things like

[foo] but not bar-foo or barfoobar... any occurance of foo that will be replaced needs to stand on it's own, it can not be contained in another word.

As a note, the "blah blah" is of varying length, and can be many different words, no words, or any combination of these.

Thanks for any suggestions.

A: 

The following seems working:

var str = "foo yea foot bfoo <span id=foo> blah blah foo blah foo blah </span> foo again <span id=foo>foo again</span>\n\nthis is foo again";
var r = new RegExp("\\bfoo\\b","ig");
str = str.replace(r, "'it works'");
alert(str);
Zafer
Thanks for the reply. The problem with this is expression (this is what I *was* using, btw) is that it replaces the "foo" inside the span tag. I'm trying to avoid this by creating a regular expression that *does not* match items in tags.
Ok but it doesn't match id=foo, it matches just foos as seperate words.
Zafer
+1  A: 

If you save the results from your regular expression as a match object like this:

var regex = new RegExp('(>[\\w\\s]*)\\bfoo\\b([\\w\\s]*<)',"ig");
var mystring = "<span id=foo> blah blah foo blah foo blah </span>";
var match = regex.exec(mystring);

You can use another simpler regular expression to take another look at the matching string to find multiple occurrences of "foo". The matching string will be in match[0].

GreeenGuru
A: 
str = str.replace(/(>[^<]*<)/g, function(s, p1) {
    return p1.replace(/\bfoo\b/g, '');
});
Scott Evernden
+2  A: 

I don't know if anyone's mentioned this before, but:

DO NOT USE REGEX TO MANIPULATE HTML.

It is a poor tool that is nowhere near equipped to handle the complexity of HTML. If you start replacing strings inside markup, you can easily give yourself not just broken markup, but also HTML-injection holes potentially leading to cross-site-scripting vulnerabilities. This:

(>[\\w\\s]*)

is not sufficient to ensure HTML you are altering is not in markup. It's perfectly valid to have a > character in an attribute value, not to mention all the other markup constructs.

If your language is JavaScript running in a web browser there is no good reason to even try, because the browser has already nicely parsed your document into Element object and Text nodes. Don't ask the browser to re-serialise all those document objects into new HTML, hack the HTML and write it back to innerHTML! As well as being slow, this will destroy all the existing content to replace it with new objects, which has the side-effect of losing all non-serialisable information like form field values, JavaScript references, expandos and event handlers.

You can simply walk through all the Text nodes in the element you want to look at doing the replacements. Trivial example:

function replaceText(element, pattern, replacement) {
    for (var childi= element.childNodes.length; childi-->0;) {
        var child= element.childNodes[childi];
        if (child.nodeType==1) # Node.ELEMENT_NODE
            replaceText(child, pattern, replacement);
        else if (child.nodeType==3) # Node.TEXT_NODE
            child.data= child.data.replace(pattern, replacement);
    }
}

replaceText($('#foo')[0], /\bfoo\b/gi, 'bar');
bobince
in general what you say is true, but if the problem is sufficiently well bounded (eg, processing just a simple sequence of spans) then regex is okay for doing this sort of thing
Scott Evernden
Thanks, this was a good starting point to my final solution! I would accept this as my answer, but my solution actually works for the purposes that I defined above.
A: 

I'm confused, why can't you do:

var replacement = $('#foo').html().replace(/\bfoo\b/g, '');
$('#foo').html(replacement);
Mark
A: 

I was attempting to do this in the wrong way. Here is the solution that I created, and seems to work great. It uses two recursive functions + DOM traversal + regular expressions to create the proper text and span nodes.

function replaceText(element, pattern, syn_text) {

for (var childi = 0; childi < element.childNodes.length;childi++) {
    var child= element2.childNodes[childi];
    if (child.nodeType==1 && child.className!=syn_text){ //make sure we don't call function on newly created node
        replaceText(child, pattern, syn_text);  //call function on child
    }
    else if (child.nodeType==3){ //this is a text node, being processing with our regular expression
        var str = child.data;
        str = str.replace(pattern,function(s, p1,p2,p3) {
            var parentNode = child.parentNode;
            do_replace(s, p1,p2,p3,parentNode,pattern,syn_text);
            parentNode.removeChild(child);  //delete old child from parent node.  we've replaced it with new nodes at this point
         });
    }
}}




function do_replace(s, p1,p2,p3,parentNode,pattern,syn_text) {
   if(p1.length>0){   //this might not be necessary
     //create textnode
      var text_node = document.createTextNode(p1);
      parentNode.appendChild(text_node);
   }
   if(p2.length > 0){ //create a span + next_node for the highlighting code
      spanTag = document.createElement("span");
      spanTag.id = "SString" + id++;
      spanTag.className = syn_text;
      spanTag.innerHTML = p2;
      parentNode.appendChild(spanTag);
   }
   if(p3.length > 0){
       //test to see if p3 contains another instance of our string.

      if(pattern.test(p3)){  //if there is a instance of our text string in the third part of the string, call function again
          p3.replace(pattern,function(s, p1,p2,p3) {
            //debugger;
            do_replace(s, p1,p2,p3,parentNode,pattern);
            return;
          });
      }
      else{  //otherwise, it's just a plain textnode, so just reinsert it.
          var text_nodep3 = document.createTextNode(p3);
          parentNode.appendChild(text_nodep3);
          return;
      }
    }
    else{ //does this do anything?
        return;
     }
return}

This function is called as follows:

syn_highlight = "highlight_me";  //class to signify highlighting 
pattern = new RegExp('([\\w\\W]*?)\\b('+ searchTerm + '[\\w]*)\\b([\\w\\W]*)',"ig");
replaceText($('#BodyContent')[0],pattern,syn_highlight);