tags:

views:

135

answers:

4

I'm just starting out using Ruby and I've written a bit of code to do basic parsing of a CSV file (Line is a basic class, omitted for brevity):

class File

  def each_csv
    each do |line|
      yield line.split(",")
    end
  end

end

lines = Array.new

File.open("some.csv") do |file|
  file.each_csv do |csv| 
    lines << Line.new(:field1 => csv[0], :field2 => csv[1])
  end
end

I have a feeling I would be better off using collect somehow rather than pushing each Line onto the array but I can't work out how to do it.

Can anyone show me how to do it or is it perfectly fine as it is?

Edit: I should have made it clear that I'm not actually going to use this code in production, it's more to get used to the constructs of the language. It is still useful to know there are libraries to do this properly though.

+2  A: 

I don't know if you are aware of it, but ruby has it's own class for parsing and writing CSV files.

I found an example of using collect to turn a csv file into an array of hashes.

def csv_to_array(file_location)
  csv = CSV::parse(File.open(file_location, 'r') {|f| f.read })
  fields = csv.shift
  csv.collect { |record| Hash[*(0..(fields.length - 1)).collect {|index| [fields[index],record[index].to_s] }.flatten ] }
end

This example is taken from this article.

If you are unfamiliar with the * notion, it basically dissolves the outer [] brackets, turning an array into a comma separated list of its elements.

georg
+1  A: 

Have you looked at FasterCSV, it does what your trying to do here, along with dealing with some of the brain deadness you find in some CSV files

aussiegeek
A: 

See how this works for you (functional programming is fun!):

Try using inject. Inject takes as a parameter the starting "accumulator", and then a two parameter block:

[1,2,3].inject(0) { |sum,num| sum+num }

is naturally 6

[1,2,3].inject(5) { |sum,num| sum+num }

is 11

[1,2,3].inject(2) { |sum,num| sum*num }

is 12

To the point:

class Line
  def initialize(options)
    @options = options
  end

  def to_s
    @options[:field1]+" "+@options[:field2]
  end
end

File.open("test.csv").lines.inject([]) do |lines,line|
  split = line.split(",")
  lines << Line.new(:field1 => split[0],:field2 => split[1])
end
Stefan Mai
+4  A: 

Here's a (possibly wild) idea, use the Struct class instead of rolling your own simple POD class. But what you want from this is to have a constructor that accepts all of the arguments that could be generated from the file data.

Line = Struct.new(:field1, :field2, :field3)

Then at the core of the algorithm you want something like:

File.open("test.csv").lines.inject([]) do |result, line|
    result << Line.new(line.split(",", Line.length))
end

or being a bit more concise and functional-like:

lines = File.open("test.csv").lines.map { |line| Line.new(line.split(",", Line.length)) }

To be honest I haven't used the Struct class much, but I should be, and I will probably refactor stuff already written to use it. It allows you to access the variables by their names like:

Line.field1 = blah
Line.field2 = 1

The Ruby Struct class.

So to actually answer your question, and looking above at the code, I would say it would be much simpler to use collect/map to perform the computation. The map function together with inject are very powerful and I find I use them quite frequently.

Daemin
The lines.map, especially with the concise version, creates a very nice syntax. Thanks for your help.
Garry Shutler
No worries, I'm glad you got that from my jumbled up answer. I'll endevour to clean it up sometime.
Daemin