views:

3241

answers:

5

I need to highlight, case insensitively, given keywords in a JavaScript string.

For example:

  • highlight("foobar Foo bar FOO", "foo") should return "<b>foo</b>bar <b>Foo</b> bar <b>FOO</b>"

I need the code to work for any keyword, and therefore using a hardcoded regular expression like /foo/i is not a sufficient solution.

What is the easiest way to do this?

(This an instance of a more general problem detailed in the title, but I feel that it's best to tackle with a concrete, useful example.)

+5  A: 

You can use regular expressions if you prepare the search string. In PHP e.g. there is a function preg_quote, which replaces all regex-chars in a string with their escaped versions.

Here is such a function for javascript:

function preg_quote( str ) {
    // http://kevin.vanzonneveld.net
    // +   original by: booeyOH
    // +   improved by: Ates Goral (http://magnetiq.com)
    // +   improved by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
    // +   bugfixed by: Onno Marsman
    // *     example 1: preg_quote("$40");
    // *     returns 1: '\$40'
    // *     example 2: preg_quote("*RRRING* Hello?");
    // *     returns 2: '\*RRRING\* Hello\?'
    // *     example 3: preg_quote("\\.+*?[^]$(){}=!<>|:");
    // *     returns 3: '\\\.\+\*\?\[\^\]\$\(\)\{\}\=\!\<\>\|\:'

    return (str+'').replace(/([\\\.\+\*\?\[\^\]\$\(\)\{\}\=\!\<\>\|\:])/g, "\\$1");
}

(Taken from http://kevin.vanzonneveld.net/techblog/article/javascript_equivalent_for_phps_preg_quote/ )

So you coudl do the following:

function highlight( data, search )
{

    return data.replace( new RegExp( preg_quote( search ), 'i' ), '<b>' + search + '</b>' );

}
okoman
I don't think your example works.
tvanfosson
I see, only the first occurence is replaced. Didn't know that replace behaves like this. The preg_quote is important if he wants to highlight strings with / or * or other regex characters.
okoman
Um.. it *is* javascript. I just said that there is a function in PHP called preg_quote. Then I included a js version of that function and a js function... these code examples *are* js
okoman
There are two errors in the second code fragment: 1 - it needs `'gi'` instead of `'i'` RegExp modifier, 2 - it's replacing with `search` instead of highlighting substrings in `data`. The first code segment may or may not be a good escaper for javascript (I don't know) but calling it preg_quote is misleading, JS RegExp ≠ PCRE.
fsb
A: 

Why not just create a new regex on each call to your function? You can use:

new Regex([pat], [flags])

where [pat] is a string for the pattern, and [flags] are the flags.

Erik Hesselink
+3  A: 
function highlightWords( line, word )
{
     var regex = new RegExp( '(' + word + ')', 'gi' );
     return line.replace( regex, "<b>$1</b>" );
}
tvanfosson
Of course, you need to be careful with what you are replacing in and what you are searching on as @bobince notes. The above will work well for plain text and most searches if you are careful to quote your regex characters...
tvanfosson
exactly what I needed :) thx
MyWhirledView
This will run into trouble if there are regex characters in the word being replaced. @okoman's solution gets around that.
Herb Caudill
+1  A: 

Regular expressions are fine as long as keywords are really words, you can just use a RegExp constructor instead of a literal to create one from a variable:

var re= new RegExp('('+word+')', 'gi');
return s.replace(re, '<b>$1</b>');

The difficulty arises if ‘keywords’ can have punctuation in, as punctuation tends to have special meaning in regexps. Unfortunately unlike most other languages/libraries with regexp support, there is no standard function to escape punctation for regexps in JavaScript.

And you can't be totally sure exactly what characters need escaping because not every browser's implementation of regexp is guaranteed to be exactly the same. (In particular, newer browsers may add new functionality.) And backslash-escaping characters that are not special is not guaranteed to still work, although in practice it does.

So about the best you can do is one of:

  • attempting to catch each special character in common browser use today [add: see Sebastian's recipe]
  • backslash-escape all non-alphanumerics. care: \W will also match non-ASCII Unicode characters, which you don't really want.
  • just ensure that there are no non-alphanumerics in the keyword before searching

If you are using this to highlight words in HTML which already has markup in, though, you've got trouble. Your ‘word’ might appear in an element name or attribute value, in which case attempting to wrap a < b> around it will cause brokenness. In more complicated scenarios possibly even an HTML-injection to XSS security hole. If you have to cope with markup you will need a more complicated approach, splitting out ‘< ... >’ markup before attempting to process each stretch of text on its own.

bobince
+2  A: 

You can enhance the RegExp object with a function that does special character escaping for you:

RegExp.escape = function(str) 
{
  var specials = new RegExp("[.*+?|()\\[\\]{}\\\\]", "g"); // .*+?|()[]{}\
  return str.replace(specials, "\\$&");
}

Then you would be able to use what the others suggested without any worries:

function highlightWordsNoCase(line, word)
{
  var regex = new RegExp("(" + RegExp.escape(word) + ")", "gi");
  return line.replace(regex, "<b>$1</b>");
}
Tomalak