views:

325

answers:

5

As everybody knows there's no built-in function to remove the duplicates from an array in javascript. I've noticed this is also lacking in jQuery (which has a unique function for DOM selections only), and the most common snippet I found checks the entire array and a subset of it for each element (not very efficient I think), like:

for (var i = 0; i < arr.length; i++)
    for (var j = i + 1; j < arr.length; j++)
        if (arr[i] === arr[j])
            //whatever

so I made my own:

function unique (arr) {
    var hash = {}, result = [];
    for (var i = 0; i < arr.length; i++)
     if (!(arr[i] in hash)) { //it works with objects! in FF, at least
      hash[arr[i]] = true;
      result.push(arr[i]);
     }
    return result;
}

I wonder if there's any other algorithm accepted as the best for this case (or if you see any obvious flaw that could be fixed), or, what do you do when you need this in javascript (I'm aware that jQuery is not the only framework and some others may have this already covered).

+3  A: 

I think your version won't work when you'll have objects or function in the array that give string representation like [Object object]. Because you can only have strings as keys in objects (in the "hash" object here). You'll need to loop into the result array to find if the new entry already exists. It will still be faster than the first method.

Prototype JS has a "uniq" method, you may get inspiration from it.

Fabien Ménager
Good point, I didn't consider the `toString` issue.
Justin Johnson
The first method doesn't work with Objects either though, if I understand you correctly. IOW === doesn't work on objects. So presuming the array will only contain "scalars" that can be compared directly with == or === (eg ints, floats, bools, strings) do you still think the second one won't work?
Roatin Marth
er, wait. I guess == works fine on object references. nm then!
Roatin Marth
All the objects that are compared will be considered equal, which is false with the first method. But the second method will work and will be really faster if the array contains only scalar values.
Fabien Ménager
fabien: IOW the first one is better in the general case, while the second one is better in the "scalar" case.
Roatin Marth
I took a look at Prototype's code and it's implement the same as Underscore's, which gives O(n) for sorted arrays and O(n^2) for unsorted.
Justin Johnson
+2  A: 

Using the object literal is exactly what I would do. A lot of people miss this technique a lot of the time, opting instead for typical array walks as the original code that you showed. The only optimization would be to avoid the arr.length lookup each time. Other than that, O(n) is about as good as you get for uniqueness and is much better than the original O(n^2) example.

function unique(arr) {
    var hash = {}, result = [];
    for ( var i = 0, l = arr.length; i < l; ++i ) {
        if ( !hash.hasOwnProperty(arr[i]) ) { //it works with objects! in FF, at least
            hash[ arr[i] ] = true;
            result.push(arr[i]);
        }
    }
    return result;
}

// * Edited to use hasOwnProperty per comments

Time complexities to summarize

  f()    | unsorted | sorted | objects | scalar | library
____________________________________________________________
unique   |   O(n)   |  O(n)  |   no    |  yes   |    n/a
original |  O(n^2)  | O(n^2) |   yes   |  yes   |    n/a
uniq     |  O(n^2)  |  O(n)  |   yes   |  yes   | Prototype
_.uniq   |  O(n^2)  |  O(n)  |   yes   |  yes   | Underscore

As with most algorithms, there are trade offs. If you are only sorting scalar values, you're modifications to the original algorithm give the most optimal solution. However, if you need to sort non-scalar values, then using or mimicking the uniq method of either of the libraries discussed would be your best choice.

Justin Johnson
It is better to use `hash.hasOwnProperty(arr[i])`. The `in` operator returns true for inherited properties like `toString`. `("toString" in {}) => true`
Chetan Sastry
Good catch @Chetan. I've updated the response.
Justin Johnson
Wouldn't the `unique` function have O(n) complexity for sorted lists as well?
Xavi
Sorry, yes. Copy+paste got the best of me.
Justin Johnson
+1  A: 

I'm not an algorithm expert by any means, but I've been keeping an eye on underscore.js. They have this as a function called uniq:

http://documentcloud.github.com/underscore/#uniq

I looked at the code in their library, and copied it here for reference (not my code, this code belongs to underscore.js):

// Produce a duplicate-free version of the array. If the array has already
// been sorted, you have the option of using a faster algorithm.
_.uniq = function(array, isSorted) {
    return _.reduce(array, [], function(memo, el, i) {
        if (0 == i || (isSorted === true ? _.last(memo) != el : !_.include(memo, el))) memo.push(el);
        return memo;
    });
};

EDIT: You need to walk through the rest of the underscore.js code, and I almost took this code out because of it. I left the code snippet in just in case this was still useful.

jeremyosborne
I'm sure !_.include iterates the array from scratch too.
Roatin Marth
I hadn't heard of this library before, so I took a walk through the code looking specifically at `_.include` and `_.last`. It looks like sorted arrays will take O(n) and unsorted will be O(n^2), so it's not a constant improvement.
Justin Johnson
Justin: good sleuthing. The OPs code sample (first one) looks to be assuming the array is sorted. It starts the inner loop from the current index + 1.
Roatin Marth
Turns out, this is implemented the same way Prototype implements `uniq`
Justin Johnson
A: 

I've noticed this is also lacking in jQuery

The Underscore library is coming on nicely as a complement to jQuery and does contain uniq() (building on other list functions).

return _.reduce(array, [], function(memo, el, i) {
  if (0 == i || (isSorted === true ? _.last(memo) != el : !_.include(memo, el))) memo.push(el);
  return memo;
});
mahemoff
@mahemoff This same answer was posted by @jeremyosborne 15 minutes before you
Justin Johnson
A: 

Unfortunately JS objects have no identity accessible from the language - as other posters have mentioned, using objects as keys in a dictionary will fail when different objects have equal string representations and there is no id() function in the language.

There is a way to avoid the O(n^2) all-pairs check for === identity if you can modify the objects. Pick a random string, walk the array once to check that no object has a property by that name, then just do arr[i][randomPropertyName]=1 for each i. If the next object in the array already has that property, then it is a duplicate.

Unfortunately, the above will only work for modifiable objects. It fails for array values that don't allow property setting (e.g. integers, 42['random']=1 just doesn't work :( )

Rafał Dowgird