views:

1566

answers:

4

I'm really struggling with grasping how to effectively use FasterCSV to accomplish what I want.

I have a CSV file; say:

ID,day,site
test,tuesday,cnn.com
bozo,friday,fark.com
god,monday,xkcd.com
test,saturday,whatever.com

I what to go through this file and end up with a hash that has a counter for how many times the first column occurred. So:

["test" => 2, "bozo" => 1, "god" => 1]

I need to be able to do this without prior knowledge of the values in the first column.

?

A: 

I don't have the code in front of me, but I believe row.to_hash does that (where row is the FasterCSV::Row of the current record)

row.headers should give you an array of the headers, incidentally. Check the docs for more: http://fastercsv.rubyforge.org/classes/FasterCSV/Row.html

Eli
But wouldn't that merely translate all rows to hashes? That's not what I want: I want the hash to have counters for unique occurrences of row[0]. Any other thoughts?
neezer
A: 

Hum, would :

File.open("file.csv").readlines[1..-1].inject({}) {|acc,line| word = line.split(/,/).first; acc[word] ||= 0; acc[word] += 1; acc}

do ?

[1..-1] because we don't want the header line with the column names

then, for each line, get the first word, put 0 in the accumulator if it does not exist, increment it, return

mat
Trying to parse a CSV file by doing `split(/,/)` is the path to a world of hurt. There's a reason why the FasterCSV gem is more than one line.
Eli
Hum, yes, of course, replace the "File.open("file.csv").readlines[1..-1]" by the correct way of reading lines from FasterCSV, and "line.split(/,/).first" by the correct way of getting the first field :-)
mat
+5  A: 

Easy:

h = Hash.new(0)
FasterCSV.read("file.csv")[1..-1].each {|row| h[row[0]] += 1}

Works the same with CSV.read, as well.

glenn mcdonald
Any reason not to use inject ?
mat
Mostly a question of taste, I think, but inject is also slower, which sometimes matters.
glenn mcdonald
OK, I just ran a quick test on a 9000-line CSV file I had handy, using the four combinations of CSV/FasterCSV and each/inject. Timings:FasterCSV+each: 1.01sFasterCSV+inject: 1.18sCSV+each: 3.32sCSV+inject: 3.34s
glenn mcdonald
Or you could even do FasterCSV.foreach to shorten it a bit.
dasil003
A: 

I'd use foreach, and treat nils with respect - or else I'd risk an "undefined nil.+ method" error...

counter = {}
FasterCSV.foreach("path_to_your_csv_file", :headers => :first_row) do |row|
  key=row[0]
  counter[key] = counter[key].nil? ? 1 : counter[key] + 1
end
egarcia