views:

94

answers:

3

I have several posts on a website; all these posts are chat conversations of this type:

AD: Hey!
BC: What's up?
AD: Nothing
BC: Okay

They're marked up as simple paragraphs surrounded by <p> tags.

Using the javascript replace function, I want all instances of "AD" in the beginning of a conversation (ie, all instances of "AD" at the starting of a line followed by a ":") to be surrounded by <strong> tags, but only if the instance isn't already surrounded by a <strong> tag.

What regex should I use to accomplish this? Am I trying to do what this advises against?

The code I'm using is like this:

var posts = document.getElementsByClassName('entry-content');

for (var i = 0; i < posts.length; i++) {
    posts[i].innerHTML = posts[i].innerHTML.replace(/some regex here/,
    'replaced content here');
}
+1  A: 

Wouldn't it be easier to set the class or style property of found paragraph to text-weight: bold or a class that does roughly the same? That way you wouldn't have to worry about adding in tags, or searching for existing tags. Might perform better, too, if you don't have to do any string replaces.

If you really want to add the strong tags anyway, I'd suggest using DOM functions to find childNodes of your paragraph that are <strong>, and if you don't find one, add it and move the original (text) childNode of the paragraph into it.

Rob
I can't add a class/style to the entire paragraph because only the first two characters of the line ["AD"] have to be in boldface.Your second suggestion seems quite good though, if a little complicated. Thanks!
sahil
+1  A: 

If AD: is always at the start of a line then the following regex should work, using the m switch:

.replace(/^AD:/gm, "<strong>AD:</strong>");

You don't need to check for the existence of <strong> because ^ will match the start of the line and the regex will only match if the sequence of characters that follows the start of the line are AD:.

You're not going against the "Don't use regex to parse HTML" advice because you're not parsing HTML, you're simply replacing a string with another string.

An alternative to regex would be to work with ranges, creating a range selecting the text and then using execCommand to make the text bold. However, I think this would be much more difficult and you would likely face differences in browser implementations. The regex way should be enough.


After seeing your comment, the following regex would work fine:

.replace(/<(p|br)>AD:/gm, "<$1><strong>AD:</strong>");
Andy E
The problem here is that while AD will always be at the start of the line in the output, in the code it might well be something like this: `<p>AD: Hey!<br>BC: Hi<br>AD: Okay</p>` , all on one line.What about this scenario?I'll check out ranges. Are they cross browser compatible?
sahil
@sahil: surprise, surprise, IE versions lower than 8 do not support ranges in the same way other browsers do. Check my updated answer for a regex that will work for you.
Andy E
A: 

Using regular expressions on the innerHTML isn't reliable and will potentially lead to problems. The correct way to do this is a tiresome process but is much more reliable.

E.g.

for (var i = 0, l = posts.length; i < l; i++) {

    findAndReplaceInDOM(posts[i], /^AD:/g, function(match, node){

        // Make sure current node does note have a <strong> as a parent
        if (node.parentNode.nodeName.toLowerCase() === 'strong') {
            return false;
        }

        // Create and return new <strong>
        var s = document.createElement('strong');
        s.appendChild(document.createTextNode(match[0]));
        return s;

    });

}

And the findAndReplaceInDOM function:

function findAndReplaceInDOM(node, regex, replaceFn) {

    // Note: regex MUST have global flag
    if (!regex || !regex.global || typeof replaceFn !== 'function') {
        return;
    }

    var start, end, match, parent, leftNode,
        rightNode, replacementNode, text,
        d = document;

    // Loop through all childNodes of "node"
    if (node = node && node.firstChild) do {

        if (node.nodeType === 1) {

            // Regular element, recurse:
            findAndReplaceInDOM(node, regex, replaceFn);

        } else if (node.nodeType === 3) {

            // Text node, introspect

            parent = node.parentNode;
            text = node.data;

            regex.lastIndex = 0;

            while (match = regex.exec(text)) {

                replacementNode = replaceFn(match, node);

                if (!replacementNode) {
                    continue;
                }

                end = regex.lastIndex;
                start = end - match[0].length;

                // Effectively split node up into three parts:
                // leftSideOfReplacement + REPLACEMENT + rightSideOfReplacement

                leftNode = d.createTextNode( text.substring(0, start) );
                rightNode = d.createTextNode( text.substring(end) );

                parent.insertBefore(leftNode, node);
                parent.insertBefore(replacementNode, node);
                parent.insertBefore(rightNode, node);

                // Remove original node from document
                parent.removeChild(node);

            }

        }
    } while (node = node.nextSibling);

}
J-P
Wow, this is quite thorough. Do you think this is absolutely required though? What kind of problems may I run into if I use regular expressions on innerHTML in the case of the question I posted?[Btw, the variable on the very first line should be `lh` or the condition should be `i < l`, right?]
sahil
Given your situation, the innerHTML solution is probably okay. This approach is just an all-round better one because it won't break in as many instances. E.g. Reading off and replacing the innerHTML removes any states/data/event-handlers bound to elements/nodes within the element that's being changed. This may not be a problem for you, but it's usually something that has to be taken into account.
J-P
Oh, of course. Thanks!
sahil