ansaurus

Question

Array Merge (Union)

Answer 1

+1 A:

Using the Array#concat method will likely be a lot faster, according to my initial benchmarks using Ruby 1.8.7:

require 'benchmark'

def reset_arrays!
  @array1 = []
  @array2 = []

  [@array1, @array2].each do |array|
    10000.times { array << ActiveSupport::SecureRandom.hex }
  end
end

reset_arrays! && puts(Benchmark.measure { @array1 | @array2 })
# => 0.030000   0.000000   0.030000 (  0.026677)

reset_arrays! && puts(Benchmark.measure { @array1.concat(@array2) })
# => 0.000000   0.000000   0.000000 (  0.000122)

Josh 2010-02-11 03:36:35

What does concat do with duplicates?

Rabbott 2010-02-11 16:02:13

It keeps them, but you could do `Array#uniq` afterwards to get rid of dups.

Josh 2010-02-11 16:14:21

Ok, I will need to test concat vs + as well to see which is faster

Rabbott 2010-02-11 16:19:40

Answer 2

+1 A:

Try this and see if this is any faster

a = [1,2,3,3,2]
b = [1,2,3,4,3,2,5,7]
(a+b).uniq

nas 2010-02-11 06:44:41

Answer 3

+2 A:

Here's a script which benchmarks two merge techniques: using the pipe operator (a1 | a2), and using concatenate-and-uniq ((a1 + a2).uniq). Two additional benchmarks give the time of concatenate and uniq individually.

require 'benchmark'

a1 = []; a2 = []
[a1, a2].each do |a|
  1000000.times { a << rand(999999) }
end

puts "Merge with pipe:"
puts Benchmark.measure { a1 | a2 }

puts "Merge with concat and uniq:"
puts Benchmark.measure { (a1 + a2).uniq }

puts "Concat only:"
puts Benchmark.measure { a1 + a2 }

puts "Uniq only:"
b = a1 + a2
puts Benchmark.measure { b.uniq }

On my machine (Ubuntu Karmic, Ruby 1.8.7), I get output like this:

Merge with pipe:
  1.000000   0.030000   1.030000 (  1.020562)
Merge with concat and uniq:
  1.070000   0.000000   1.070000 (  1.071448)
Concat only:
  0.010000   0.000000   0.010000 (  0.005888)
Uniq only:
  0.980000   0.000000   0.980000 (  0.981700)

Which shows that these two techniques are very similar in speed, and that uniq is the larger component of the operation. This makes sense intuitively, being O(n) (at best), whereas simple concatenation is O(1).

So, if you really want to speed this up, you need to look at how the <=> operator is implemented for the objects in your arrays. I believe that most of the time is being spent comparing objects to ensure inequality between any pair in the final array.

Alex Reisner 2010-02-11 16:36:25

So the pipe operator will utilize the <=> operator if I just overload that? I'm not sure as though I'll be able to speed it up, as you can see that the object it is comparing is rather simple, but maybe it will speed it up if i just have it compare the source column instead of all 5 columns.. we shall see. Great answer though!

Rabbott 2010-02-11 19:36:22

Answer 4

+1 A:

Do you need the items to be in a specific order within the arrays? If not, you may want to check whether using Sets makes it faster.

Update

Adding to another answerer's code:

require "set"
require "benchmark"

a1 = []; a2 = []
[a1, a2].each do |a|
  1000000.times { a << rand(999999) }
end

s1, s2 = Set.new, Set.new

[s1, s2].each do |s|
  1000000.times { s << rand(999999) }
end

puts "Merge with pipe:"
puts Benchmark.measure { a1 | a2 }

puts "Merge with concat and uniq:"
puts Benchmark.measure { (a1 + a2).uniq }

puts "Concat only:"
puts Benchmark.measure { a1 + a2 }

puts "Uniq only:"
b = a1 + a2
puts Benchmark.measure { b.uniq }

puts "Using sets"
puts Benchmark.measure {s1 + s2}

puts "Starting with arrays, but using sets"
puts Benchmark.measure {s3, s4 = [a1, a2].map{|a| Set.new(a)} ; (s3 + s4)}

gives (for ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0])

Merge with pipe:
  1.320000   0.040000   1.360000 (  1.349563)
Merge with concat and uniq:
  1.480000   0.030000   1.510000 (  1.512295)
Concat only:
  0.010000   0.000000   0.010000 (  0.019812)
Uniq only:
  1.460000   0.020000   1.480000 (  1.486857)
Using sets
  0.310000   0.010000   0.320000 (  0.321982)
Starting with arrays, but using sets
  2.340000   0.050000   2.390000 (  2.384066)

Suggests that sets may or may not be faster, depending on your circumstances (lots of merges or not many merges).

Andrew Grimm 2010-02-12 02:16:41

Order doesnt matter because i have to shuffle the array as well after its all merged - but the merge using sets is slower than the merge using concat which is the fastest method weve been able to come up with thus far. And I do have to use arrays because im merging an ActiveRecord dataset from the db, which is an array.. (I think)

Rabbott 2010-02-12 14:32:31

ansaurus

tags:

views:

answers:

Array Merge (Union)

related questions