views:

633

answers:

4

What's the fastest/one-liner way to remove duplicates in an array of objects, based on a specific key:value, or a result returned from a method?

For instance, I have 20 XML Element nodes that are all the same name, but they have different "text" values, some of which are duplicates. I would like to remove the duplicates by saying "if element.text == previous_element.text, remove it". How do I do that in Ruby in the shortest amount of code?

I've seen how to do it for simple string/integer values, but not for objects.

+2  A: 

This should do what you want. At least for the case where you're applying the same method to each object in the array. Running time: O(n)

seen_keys = {}
array.reject do |object| 
  if ! object.value.nil? && seen_keys[object.value] 
    return true 
  else 
    seen_keys[object.value] == true
    return false
  end
end
EmFi
+3  A: 

Here's the standard hashy way. Note the use of ||= operator, which is a more convenient (a ||= b) way to write a = b unless a.

array.inject({}) do |hash,item|
   hash[item.text]||=item
   hash 
end.values.inspect

You can do it in a single line either.

The script needs O(n) equality checks of text strings. That's what's covered under O(n) when you see a hash.

Pavel Shved
Not exactly the fastest, as it runs in O(n^2) time. Then again it's not really important given how cheap CPU time is now.
EmFi
@EmFi , accessing hash table doesn't take O(n) (we should iterate string `text`, but we'll have to do it anyway). I've just posted an answer about this matter: http://stackoverflow.com/questions/1590405/distinguishing-extra-element-from-two-arrays/1590536#1590536
Pavel Shved
@Pavel Sorry, you're right. I got confused for a second into thinking that the added values call made it O(n^2). When it just makes it O(2n).
EmFi
+2  A: 

This does it all:

Hash[*a.map{|x| [x.text, x]}].values

short? yep.

(asterisk is optional; seems to be required for 1.8.6).

For example:

a = [Thing.new('a'), Thing.new('b'), Thing.new('c'), Thing.new('c')]
=> [#<Thing a>, #<Thing b>, #<Thing c>, #<Thing c>]

Hash[a.map{|x| [x.text, x]}].values
=> [#<Thing a>, #<Thing b>, #<Thing c>]

Boring part: here's the little test class I used:

class Thing
  attr_reader :text
  def initialize(text)
    @text = text
  end

  def inspect
    "#<Thing #{text}>"
  end
end
Peter
viatropos
Peter
I have the following error: in `[]': odd number of arguments for Hash (ArgumentError)
Pavel Shved
huh? what's your output for `a.map{|x| [x.text, x]}`? I double checked this and it seems ok...
Peter
running ruby 1.8.6; didn't rewrite `inspect` method; here's the output: http://pastebin.com/d42b3ce59. It happens to be **one** array of arrays, and one is an odd integer.
Pavel Shved
I guess they made a change from 1.8.6 to 1.8.7. I don't have access to 1.8.6, but I believe that you can do `Hash[*a.map{|x| [x.text, x]}].values` instead.
Peter
I *can* do that, but then the results then lacks one item :( Here's my code so you can get the whole picture: http://pastebin.com/m1594877c
Pavel Shved
A: 

"error: in `[]': odd number of arguments for Hash (ArgumentError) "

I get the same error. You would think that a de-dupe method should be included in the Enumerable module.

paul