ansaurus

Question

Answer 1

+3 A:

I think your version won't work when you'll have objects or function in the array that give string representation like [Object object]. Because you can only have strings as keys in objects (in the "hash" object here). You'll need to loop into the result array to find if the new entry already exists. It will still be faster than the first method.

Prototype JS has a "uniq" method, you may get inspiration from it.

Fabien Ménager 2009-12-11 19:10:06

Good point, I didn't consider the `toString` issue.

Justin Johnson 2009-12-11 19:16:38

The first method doesn't work with Objects either though, if I understand you correctly. IOW === doesn't work on objects. So presuming the array will only contain "scalars" that can be compared directly with == or === (eg ints, floats, bools, strings) do you still think the second one won't work?

Roatin Marth 2009-12-11 19:17:34

er, wait. I guess == works fine on object references. nm then!

Roatin Marth 2009-12-11 19:19:31

All the objects that are compared will be considered equal, which is false with the first method. But the second method will work and will be really faster if the array contains only scalar values.

Fabien Ménager 2009-12-11 19:21:14

fabien: IOW the first one is better in the general case, while the second one is better in the "scalar" case.

Roatin Marth 2009-12-11 19:24:56

I took a look at Prototype's code and it's implement the same as Underscore's, which gives O(n) for sorted arrays and O(n^2) for unsorted.

Justin Johnson 2009-12-11 19:27:04

Answer 2

+2 A:

Using the object literal is exactly what I would do. A lot of people miss this technique a lot of the time, opting instead for typical array walks as the original code that you showed. The only optimization would be to avoid the arr.length lookup each time. Other than that, O(n) is about as good as you get for uniqueness and is much better than the original O(n^2) example.

function unique(arr) {
    var hash = {}, result = [];
    for ( var i = 0, l = arr.length; i < l; ++i ) {
        if ( !hash.hasOwnProperty(arr[i]) ) { //it works with objects! in FF, at least
            hash[ arr[i] ] = true;
            result.push(arr[i]);
        }
    }
    return result;
}

// * Edited to use hasOwnProperty per comments

Time complexities to summarize

  f()    | unsorted | sorted | objects | scalar | library
____________________________________________________________
unique   |   O(n)   |  O(n)  |   no    |  yes   |    n/a
original |  O(n^2)  | O(n^2) |   yes   |  yes   |    n/a
uniq     |  O(n^2)  |  O(n)  |   yes   |  yes   | Prototype
_.uniq   |  O(n^2)  |  O(n)  |   yes   |  yes   | Underscore

As with most algorithms, there are trade offs. If you are only sorting scalar values, you're modifications to the original algorithm give the most optimal solution. However, if you need to sort non-scalar values, then using or mimicking the uniq method of either of the libraries discussed would be your best choice.

Justin Johnson 2009-12-11 19:10:25

It is better to use `hash.hasOwnProperty(arr[i])`. The `in` operator returns true for inherited properties like `toString`. `("toString" in {}) => true`

Chetan Sastry 2009-12-11 19:25:04

Good catch @Chetan. I've updated the response.

Justin Johnson 2009-12-11 19:38:40

Wouldn't the `unique` function have O(n) complexity for sorted lists as well?

Xavi 2009-12-11 22:20:24

Sorry, yes. Copy+paste got the best of me.

Justin Johnson 2009-12-11 22:23:33

Answer 3

+1 A:

I'm not an algorithm expert by any means, but I've been keeping an eye on underscore.js. They have this as a function called uniq:

http://documentcloud.github.com/underscore/#uniq

I looked at the code in their library, and copied it here for reference (not my code, this code belongs to underscore.js):

// Produce a duplicate-free version of the array. If the array has already
// been sorted, you have the option of using a faster algorithm.
_.uniq = function(array, isSorted) {
    return _.reduce(array, [], function(memo, el, i) {
        if (0 == i || (isSorted === true ? _.last(memo) != el : !_.include(memo, el))) memo.push(el);
        return memo;
    });
};

EDIT: You need to walk through the rest of the underscore.js code, and I almost took this code out because of it. I left the code snippet in just in case this was still useful.

jeremyosborne 2009-12-11 19:12:09

I'm sure !_.include iterates the array from scratch too.

Roatin Marth 2009-12-11 19:19:01

I hadn't heard of this library before, so I took a walk through the code looking specifically at `_.include` and `_.last`. It looks like sorted arrays will take O(n) and unsorted will be O(n^2), so it's not a constant improvement.

Justin Johnson 2009-12-11 19:21:15

Justin: good sleuthing. The OPs code sample (first one) looks to be assuming the array is sorted. It starts the inner loop from the current index + 1.

Roatin Marth 2009-12-11 19:23:06

Turns out, this is implemented the same way Prototype implements `uniq`

Justin Johnson 2009-12-11 19:24:57

Answer 4

A:

I've noticed this is also lacking in jQuery

The Underscore library is coming on nicely as a complement to jQuery and does contain uniq() (building on other list functions).

return _.reduce(array, [], function(memo, el, i) {
  if (0 == i || (isSorted === true ? _.last(memo) != el : !_.include(memo, el))) memo.push(el);
  return memo;
});

mahemoff 2009-12-11 19:27:09

@mahemoff This same answer was posted by @jeremyosborne 15 minutes before you

Justin Johnson 2009-12-11 19:39:44

Answer 5

A:

Unfortunately JS objects have no identity accessible from the language - as other posters have mentioned, using objects as keys in a dictionary will fail when different objects have equal string representations and there is no id() function in the language.

There is a way to avoid the O(n^2) all-pairs check for === identity if you can modify the objects. Pick a random string, walk the array once to check that no object has a property by that name, then just do arr[i][randomPropertyName]=1 for each i. If the next object in the array already has that property, then it is a duplicate.

Unfortunately, the above will only work for modifiable objects. It fails for array values that don't allow property setting (e.g. integers, 42['random']=1 just doesn't work :( )

Rafał Dowgird 2009-12-11 22:12:55

ansaurus

tags:

views:

answers:

unique() for arrays in javascript

related questions