ansaurus

Question

Scala: Remove duplicates in list of objects

Answer 1

+11 A:

list.groupBy{_.property}.map{_._2.head}

Explanation: The groupBy method accepts a function that converts an element to a key for grouping. _.property is just shorthand for elem: Object => elem.property (the compiler generates a unique name, something like x$1). So now we have a map Map[Property, List[Object]]. A Map[K,V] extends Traversable[(K,V)]. So it can be traversed like a list, but elements are a tuple. This is similar to Java's Map#entrySet(). The map method creates a new collection by iterating each element and applying a function to it. In this case the function is _._2.head which is shorthand for elem: (Property, List[Object]) => elem._2.head. _2 is just a method of Tuple that returns the second element. The second element is List[Object] and head returns the first element

To get the result to be a type you want:

import collection.breakOut
val l2: List[Object] = list.groupBy{_.property}.map{_._2.head}(breakOut)

To explain briefly, map actually expects two arguments, a function and an object that is used to construct the result. In the first code snippet you don't see the second value because it is marked as implicit and so provided by the compiler from a list of predefined values in scope. The result is usually obtained from the mapped container. This is usually a good thing. map on List will return List, map on Array will return Array etc. In this case however, we want to express the container we want as result. This is where the breakOut method is used. It constructs a builder (the thing that builds results) by only looking at the desired result type. It is a generic method and the compiler infers its generic types because we explicitly typed l2 to be List[Object]

or, to preserve order (assuming Object#property is of type Property):

list.foldLeft((List[Object](), Set[Property]())){(r, o) => 
  if (r._2(o.property)) r else (o :: r._1, r._2 + o.property)
}._1.reverse

foldLeft is a method that accepts an initial result and a function that accepts an element and returns an updated result. The method iterates each element, updating the result according to applying the function to each element and returning the final result. In this case, the initial result is a pair (tuple) of an empty list and a set. The list is the result we're interested in and the set is used to keep track of what properties we already encountered. In each iteration we check if the set (r._2) already contains the property (in Scala, obj(x) is translated to obj.apply(x). In Set, the method apply is def apply(a: A): Boolean. That is, accepts an element and returns true / false if it exists or not). If the property exists (already encountered), the result is returned as-is. Otherwise the result is updated to contain the object (o :: r._1) and the property is recorded (r._2 + o.property)

Update: @andreypopp wanted a generic method:

import scala.collection.IterableLike
import scala.collection.generic.CanBuildFrom

class RichCollection[A, Repr](xs: IterableLike[A, Repr]){
  def distinctBy[B, That](f: A => B)(implicit cbf: CanBuildFrom[Repr, A, That]) = {
    val builder = cbf(xs.repr)
    val i = xs.iterator
    var set = Set[B]()
    while(i.hasNext) {
      val o = i.next
      val b = f(o)
      if (!set(b)) {
        set += b
        builder += o
      }
    }
    builder.result
  }
}

implicit def toRich[A, Repr](xs: IterableLike[A, Repr]) = new RichCollection(xs)

to use:

scala> list.distinctBy(_.property)
res7: List[Obj] = List(Obj(1), Obj(2), Obj(3))

IttayD 2010-10-12 08:37:44

perfect, just made a paradigm shift.

parsa28 2010-10-12 08:40:41

Would be awesome if you can provide a quick explanation. I think Scala is sufficiently new that not everyone will understand this immediately.

Sudhir Jonathan 2010-10-12 08:40:57

Specifically, what does `_2` do in this context?

Sudhir Jonathan 2010-10-12 08:41:17

@Sudhir: _1 and _2 are methods that return the first and second element of a tuple.

Landei 2010-10-12 08:48:37

ref 2nd solution, Property is not a type so you can't do Set[Property] to my understanding.

parsa28 2010-10-12 08:49:32

Ah, okay. Very nice... this is why I'm staring Scala :D

Sudhir Jonathan 2010-10-12 08:51:13

Maybe scala collection needs distinct(A => B), that do distinct by key?

andreypopp 2010-10-12 11:59:19

Added such a method

IttayD 2010-10-12 12:47:13

+1, This method - `distinctBy` - should be added to the stdlib, methinks.

missingfaktor 2010-10-12 14:20:27

Answer 2

+5 A:

Here is a little bit sneaky but fast solution that preserves order:

list.filter{ var set = Set[Property]()
    obj => val b = set(obj.property); set += obj.property; b}

Although it uses internally a var, I think it is easier to understand and to read than the foldLeft-solution.

Landei 2010-10-12 09:00:04

I agree. Cool trick with the scope hiding of the var

IttayD 2010-10-12 09:08:23

I'm clearly missing something here. What is Property exactly?

parsa28 2010-10-12 11:00:53

@parsa28: Property is the type of obj.property

Landei 2010-10-12 11:31:09

Answer 3

+1 A:

Hi!

One more solution


    @tailrec
    def CollectUnique(l: List[Object], s: Set[Property], u: List[Object]): List[Object] = l match {

      case Nil => u.reverse
      case (h :: t) => if (s(h.property)) CollectUnique(t,s,u) 
                          else CollectUnique(t,s+h.prop,h::u)
    }

walla 2010-10-12 09:54:57

ansaurus

tags:

views:

answers:

Scala: Remove duplicates in list of objects

related questions