ansaurus

Question

How do I efficiently reject Strings in an array if they (regex) match Strings in a second array in Ruby?

Answer 1

+1 A:

words.each do |word|
  sentences -= sentences.grep(/#{word}/)
end

It's still the same basic time complexity (probably less efficient overall), but you can get around writing out the double loop.

Be aware that with this solution, words need not match entire whitespace separated words in the sentence. So, the word cat would knock out the sentence: String concatenation is gross.

perimosocordiae 2009-12-06 19:12:36

Thanks p. Doing something similar right now but your version a little prettier. Aware of the regex issues, just wanted a simple example to illustrate the problem.

Matt 2009-12-06 19:23:27

Answer 2

A:

You could join all the words together into one regex, with the words separated by the "|" character.

sentence =~ /word1|word2|..../

You can convert the word array into a suitable regex with array.join("|").

If the words are likely to contain regex metacharacters then enclose each word in in non-capturing brackets.

sentence =~ /(?:word1)|(?:word2)|..../

Using a single regex should be much more efficient than looping through the words array, since the regex will be compiled into a single statetable.

Dave Kirby 2009-12-06 19:15:44

Nice! I forgot about this approach. Thanks. Will benchmark it vs. perimosocordiae's answer.

Matt 2009-12-06 19:19:13

Dave- what's the #~ format doing in this case? Did you mean =~?

Matt 2009-12-06 20:05:59

Whoops - the #~ was a typo - I did mean =~

Dave Kirby 2009-12-07 14:47:19

Did you ever do that benchmark, Matt? I'm curious to know what kind speed-up you actually get.

perimosocordiae 2009-12-16 02:14:46

Answer 3

A:

words = [...]
sentences = [....]

result = sentences.select{|sentence| !words.any?{|word| sentence =~ /#{word}/}}

aaz 2009-12-06 19:18:54

Answer 4

A:

Joining strings into a Regexp is a pretty bad idea because backtracking slows things down horribly and because you run into limits on the regex size pretty quickly. (Though it may work well in practice if wordarray is small)

Consider using one of the DictionaryMatcher Ruby Quiz solutions.

Then you can operate as follows:

dm=DictionaryMatcher.new
wordarray.each{|w| dm << w}
sentencearray.reject{|s| s =~ dm}

Ken Bloom 2009-12-06 23:02:11

Thanks much for the pointers, prefix trees are cool, and in fact I have a comment in existing code to check them out for this problem.

Matt 2009-12-07 18:38:13

It isn't the prefix-trees that's cool. It's the Knuth-Morris-Pratt matching that's cool, though there was another less theoretically efficient algorithm in the solutions that nevertheless benchmarked better.

Ken Bloom 2009-12-09 04:42:54

ansaurus

tags:

views:

answers:

How do I efficiently reject Strings in an array if they (regex) match Strings in a second array in Ruby?

related questions