views:

116

answers:

1

I need to search the text in a HTML document for reg-exes(emails, phone numbers, etc) and words. The matches need to be highlighted and be made anchor-able so that a link can be generated to jump to the location of the matches. So not only does it need to find matches using patterns in needs to do a replace do add the proper html code.

I am currently using jquery but I am not very happy with the speed. In a 1.5mb file it takes about 5 seconds to match 2 regexes and it increases when I add more search criteria.

Does anyone know of a fast method to find regex matches in a large document using javascript?

+1  A: 

You say you're "using jQuery" but you don't say how. Have you tried a "highlight" plugin (or, as it sounds like you'd need, a derivation of one)? I've used this one: http://johannburkard.de/blog/programming/javascript/highlight-javascript-text-higlighting-jquery-plugin.html and it doesn't seem slow to me. Again, you'd have to work on it to make it add the markup you need, but that should be pretty clear - it's not very big.

It seems like what you'd want to do for performance is take your regular expressions and combine them into what amounts to a "token grammar". In other words, you don't want to start from scratch looking for each regex individually throughout the entire document. Instead, you'd want to proceed through it with a regex that matches each possible target (one at a time of course), and each time it finds one you'd replace it with whatever's appropriate. That way you could make just one pass over the document, no matter how big it is and no matter how many patterns you're looking for.

edit Mr. Burkard's plugin doesn't let you search with regexes; it uses "indexOf" internally. Hmm.

Pointy