ansaurus

Question

efficient method to replace multiple words in text

Answer 1

+2 A:

Using a regular expression might be a good option:

var words = ['bon', 'mad'];
'joe bon joe mad'.replace(new RegExp('(' + words.join('|') + ')', 'g'), '');
// 'joe  joe  '

The regex¹ isn't very complicated with things like look-ahead, and the regexp engine is written in C/C++, so you can expect it be quite fast. Nevertheless - benchmark and see if the performance fits your needs.

I don't think that implementing your own parser will be faster, but I might be wrong - benchmark.

Sending the document to the server doesn't sound very good to me. With 100k words you are looking at a payload in the megabytes range, and you still have to do something with it on the server and push it back.

¹ You might have to tune the regexp to do something with the spaces.

Emil Ivanov 2010-02-03 08:33:00

You might want to add word border checks `'\\b(' + words.join('|') + ')\\b'`

Justin Johnson 2010-02-03 08:38:48

The regexp could use some love, I agree, but it illustrates the point.

Emil Ivanov 2010-02-03 08:46:41

Answer 2

A:

My instinct tells me that for such a large number of keywords - sorting the keywords and creating a per character state machine would be much faster than a regular expression, since the state machine is trivial, it can be generated automatically.

Ofir 2010-02-03 08:49:03

Answer 3

A:

A state machine seems to be often used for similar tasks, e.g. http://www.codeproject.com/KB/string/civstringset.aspx

Ofir 2010-02-03 08:51:20

ansaurus

tags:

views:

answers:

efficient method to replace multiple words in text

related questions