ansaurus

Question

How to return a Ruby array intersection with duplicate elements? (problem with bigrams in Dice Coefficient)

Answer 1

A:

From this link I believe you should not use Ruby's sets but rather multisets, so that every bigram gets counted the number of times it appears. Maybe you can use this gem for multisets. This should give a correct behavior for recurring bigrams.

Yuval F 2009-10-21 11:44:44

tyvm, testing it atm.

Rui 2009-10-21 11:59:54

Answer 2

+2 A:

As Yuval F said you should use multiset. However, there is nomultiset in Ruby standard library , Take at look at here and here.

If performance is not that critical for your application, you still can do it usingArray with a little bit code.

def intersect  a , b  
    a.inject([]) do |intersect, s|
      index = b.index(s)
      unless index.nil?
         intersect << s
         b.delete_at(index)
      end
      intersect        
    end
end

a=  ["al","al","lc" ,"lc","ld"]
b = ["al","al" ,"lc" ,"ef"]
puts intersect(a ,b).inspect   #["al", "al", "lc"]

pierr 2009-10-21 12:38:47

I appreciate it pierr. The code u posted does the trick =) ty

Rui 2009-10-21 13:10:12

Answer 3

A:

I toyed with this, based on the answer from @pierr, for a while and ended up with this.

a = ["al","al","lc","lc","lc","lc","ld"]
b = ["al","al","al","al","al","lc","ef"]
result=[]
h1,h2=Hash.new(0),Hash.new(0)
a.each{|x| h1[x]+=1}
b.each{|x| h2[x]+=1}
h1.each_pair{|key,val| result<<[key]*[val,h2[key]].min if h2[key]!=0}
result.flatten

=> ["al", "al", "lc"]

This could be a kind of multiset intersect of a & b but don't take my word for it because I haven't tested it enough to be sure.

Jonas Elfström 2010-02-17 11:29:17

ansaurus

tags:

views:

answers:

How to return a Ruby array intersection with duplicate elements? (problem with bigrams in Dice Coefficient)

related questions