views:

592

answers:

1

I have my CSV file imported as such:

records = FasterCSV.read(path, :headers => true, :header_converters => :symbol)

How can I get the unique occurences of my data? For instance, here some sample data:

ID,Timestamp
test,2008.12.03.20.26.32
test,2008.12.03.20.26.38
test,2008.12.03.20.26.41
test,2008.12.03.20.26.42
test,2008.12.03.20.26.43
test,2008.12.03.20.26.44
cnn,2008.12.03.20.30.37
cnn,2008.12.03.20.30.49

If I simply call records[:id], I just get:

testtesttesttesttesttestcnncnn

I would like to get this:

testcnn

How can I do this?

+2  A: 

If your data is not masive you can use the Set class.

Here's an example:

p ['cnn','test','test','test','test','cnn','cnn'].to_set.to_a
=> ["cnn", "test"]

Here's a simple benchmark:

require 'set'
require 'benchmark'

Benchmark.bm(5) do |x|
  x.report("Set")   do
    a = []
    20_000.times do |i|
      a << 'cnn'<< 'test'
    end
    a.to_set.to_a
  end
end

=>
           user     system      total        real

Set    0.110000   0.000000   0.110000 (  0.109000)
krusty.ar
How massive are we talking? My CSV file will have on average 2000 entries; is that too big?
neezer
No, that's not big. I'm adding a benchmark to the answer.
krusty.ar