views:

497

answers:

4

What's the "Rubyist" way to do the following data structure transformation:

I have

    incoming = [ {:date => 20090501, :width => 2}, 
                 {:date => 20090501, :height => 7}, 
                 {:date => 20090501, :depth => 3}, 
                 {:date => 20090502, :width => 4}, 
                 {:date => 20090502, :height => 6}, 
                 {:date => 20090502, :depth => 2},
               ]

and I want to collapse these by :date, to end up with

    outgoing = [ {:date => 20090501, :width => 2, :height => 7, :depth => 3},
                 {:date => 20090502, :width => 4, :height => 6, :depth => 2},
               ]

An array of arrays would also be fine at the last step, provided that the columns are in the same order in each row. Also, importantly, I do not know all the hash keys in advance (that is, I do not know :width, :height, or :depth -- they could be :cats, :dogs, and :hamsters).

A: 

Try this:

incoming = [ {:date => 20090501, :width => 2}, 
                 {:date => 20090501, :height => 7}, 
                 {:date => 20090501, :depth => 3}, 
                 {:date => 20090502, :width => 4}, 
                 {:date => 20090502, :height => 6}, 
                 {:date => 20090502, :depth => 2},
               ]

# Grouping by `:date`
temp = {}

incoming.each do |row|
    if temp[row[:date]].nil? 
        temp[row[:date]] = []
    end

    temp[row[:date]] << row
end      

# Merging it together
outcoming = []         

temp.each_pair do |date, hashlist|
    res = {}
    hashlist.each do |hash|
        res.merge!(hash)
    end
    outcoming << res 
end

For information concerning the hash-members, see this page

When ordering is important, you must use jagged arrays:

incoming = [ {:date => 20090501, :width => 2}, 
                 {:date => 20090501, :height => 7}, 
                 {:date => 20090501, :depth => 3}, 
                 {:date => 20090502, :width => 4}, 
                 {:date => 20090502, :height => 6}, 
                 {:date => 20090502, :depth => 2},
               ]

# Grouping by `:date`
temp = {}

incoming.each do |row|
    if temp[row[:date]].nil? 
        temp[row[:date]] = []
    end
    key = row[:date]
    row.delete :date
    temp[key] << row
end      

# Merging it together
outcoming = []         

temp.each_pair do |date, hashlist|
    res = [:date, date]
    hashlist.each do |hash|
        hash.each_pair {|key, value| res << [key, value] }
    end
    outcoming << res
end
Dario
+2  A: 

Here is a one liner :)

incoming.inject({}){ |o,i| o[i[:date]]||=[];o[i[:date]]<<i;o}.map{|a| a[1].inject(){|o,i| o.merge(i)}}

But actually the previous post is more clear, and might be faster too.

EDIT: with a bit of optimization:

p incoming.inject(Hash.new{|h,k| h[k]=[]}){ |o,i| o[i[:date]]<<i;o}.map{|a| a[1].inject(){|o,i| o.merge(i)}}
SztupY
Explanation: first inject creates a hash with the dates as keys and an array of the hashes as values. Then the map will merge all of these hashes into one.
SztupY
Nice anyway ;-)
Dario
+2  A: 

A concise solution:

incoming = [ {:date => 20090501, :width => 2}, 
             {:date => 20090501, :height => 7}, 
             {:date => 20090501, :depth => 3}, 
             {:date => 20090502, :width => 4}, 
             {:date => 20090502, :height => 6}, 
             {:date => 20090502, :depth => 2},
           ]

temp = Hash.new {|hash,key| hash[key] = {}}
incoming.each {|row| temp[row[:date]].update(row)}
outgoing = temp.values.sort {|*rows| rows[0][:date] <=> rows[1][:date]}

The only thing that's at all tricky here is the Hash constructor, which allows you to supply a block that's called when you access a nonexistent key. So I have the Hash create an empty Hash for us to update with the values we're finding. Then I just use the date as the hash keys, sort the hash values by date and we're transformed.

Chuck
+7  A: 

If using Ruby 1.8.7 or Ruby 1.9+ the following code reads well:

incoming.group_by{|hash| hash[:date]}.map do |_, hashes| 
  hashes.reduce(:merge)
end

The underscore in the block attributes (_, hashes) indicates that we don't need/care about that particular attribute.

#reduce is an alias for #inject, which is used to reduce a collection into a single item. In the new Ruby versions it also accepts a symbol, which is the name of the method used to do the reduction.

It starts out by calling the method on the first item in the collection with the second item as the argument. It then calls the method again on the result with the third item as the argument and so on until there are no more items.

[1, 3, 2, 2].reduce(:+) => [4, 2, 2] => [6, 2] => 8
Tor Erik Linnerud
nice... didn't knew about the group_by feature in recent rubies.
SztupY