views:

251

answers:

5

Let A and B be two sets. I'm looking for really fast or elegant ways to compute the set difference (A - B or A \B, depending on your preference) between them. The two sets are stored and manipulated as Javascript arrays, as the title says.

Notes:

  • Gecko-specific tricks are okay
  • I'd prefer sticking to native functions (but I am open to a lightweight library if it's way faster)
  • I've seen, but not tested, JS.Set (see previous point)

Edit: I noticed a comment about sets containing duplicate elements. When I say "set" I'm referring to the mathematical definition, which means (among other things) that they do not contain duplicate elements.

+4  A: 

if don't know if this is most effective, but perhaps the shortest

A = [1, 2, 3, 4];
B = [1, 3, 4, 7];

diff = A.filter(function(x) { return B.indexOf(x) < 0 })
stereofrog
+1: not the most efficient solution, but definitely short and readable
Christoph
Note: array.filter is not supported cross-browser (e.g. not in IE). It seems not to matter to @Matt since he stated that "Gecko-specific tricks are okay" but I think it's worth mentioning.
Eric Bréchemier
A: 

This works, but I think stereofrog one is much more shorter, and elegant too

A = [1, 'a', 'b', 12];
B = ['a', 3, 4, 'b'];

diff_set = {
    ar : {},
    diff : Array(),
    remove_set : function(a) { ar = a; return this; },
    remove: function (el) {
        if(ar.indexOf(el)<0) this.diff.push(el);
    }
}

A.forEach(diff_set.remove_set(B).remove,diff_set);
C = diff_set.diff;
xavivars
+2  A: 

You can use an object as a map to avoid linearly scanning B for each element of A as in stereofrog's answer:

function setMinus(A, B) {
    var map = {}, C = [];

    for(var i = B.length; i--; )
        map[B[i].toSource()] = null; // any other value would do

    for(var i = A.length; i--; ) {
        if(!map.hasOwnProperty(A[i].toSource()))
            C.push(A[i]);
    }

    return C;
}

The non-standard toSource() method is used to get unique property names; if all elements already have unique string representations (as is the case with numbers), you can speed up the code by dropping the toSource() invocations.

Christoph
A: 

Incorporating the ideas from stereofrog and Christoph and assuming a couple of non-standard iteration methods on arrays and objects/hashes (each and friends), we can get set difference, union and intersection in linear time in about 20 lines total:

var setOPs = {
  minusAB : function (a, b) {
    var h = {};
    b.each(function (v) { h[v] = true; });
    return a.filter(function (v) { return !h.hasOwnProperty(v); });
  },
  unionAB : function (a, b) {
    var h = {}, f = function (v) { h[v] = true; };
    a.each(f);
    b.each(f);
    return myUtils.keys(h);
  },
  intersectAB : function (a, b) {
    var h = {};
    a.each(function (v) { h[v] = 1; });
    b.each(function (v) { h[v] = (h[v] || 0) + 1; });
    var fnSel = function (v, count) { return count > 1; };
    var fnVal = function (v, c) { return v; };
    return myUtils.select(h, fnSel, fnVal);
  }
};

This assumes that each and filter are defined for arrays, and that we have two utility methods:

  • myUtils.keys(hash): returns an array with the keys of the hash

  • myUtils.select(hash, fnSelector, fnEvaluator): returns an array with the results of calling fnEvaluator on the key/value pairs for which fnSelector returns true.

The select() is loosely inspired by Common Lisp, and is merely filter() and map() rolled into one. (It would be better to have them defined on Object.prototype, but doing so wrecks havoc with jQuery, so I settled for static utility methods.)

Performance: Testing with

var a = [], b = [];
for (var i = 100000; i--; ) {
  if (i % 2 !== 0) a.push(i);
  if (i % 3 !== 0) b.push(i);
}

gives two sets with 50,000 and 66,666 elements. With these values A-B takes about 75ms, while union and intersection are about 150ms each. (Mac Safari 4.0, using Javascript Date for timing.)

I think that's decent payoff for 20 lines of code.

j-g-faustus
you should still check `hasOwnProperty()` even if the elements are numeric: otherwise, something like `Object.prototype[42] = true;` means `42` can never occur in the result set
Christoph
Granted that it would be possible to set 42 in that way, but is there a semi-realistic use case where anyone would actually do so?But for general strings I take the point - it could easily conflict with some Object.prototype variable or function.
j-g-faustus
A: 

I would hash the array B, then keep values from the array A not present in B:

function getHash(array){
  // Hash an array into a set of properties
  //
  // params:
  //   array - (array) (!nil) the array to hash
  //
  // return: (object)
  //   hash object with one property set to true for each value in the array

  var hash = {};
  for (var i=0; i<array.length; i++){
    hash[ array[i] ] = true;
  }
  return hash;
}

function getDifference(a, b){
  // compute the difference a\b
  //
  // params:
  //   a - (array) (!nil) first array as a set of values (no duplicates)
  //   b - (array) (!nil) second array as a set of values (no duplicates)
  //
  // return: (array)
  //   the set of values (no duplicates) in array a and not in b, 
  //   listed in the same order as in array a.

  var hash = getHash(b);
  var diff = [];
  for (var i=0; i<a.length; i++){
    var value = a[i];
    if ( !hash[value]){
      diff.push(value);
    }
  }
  return diff;
}
Eric Bréchemier
that's exactly the same algorithm I posted half an hour ago
Christoph
@Christoph: you are right... I failed to notice that. I find my implementation more simple to understand though :)
Eric Bréchemier