views:

641

answers:

4

Given two sets A and B, what is the common algorithm used to find their union, and what is it's running time?

My intuition:

a = set((1, 2, 3))
b = set((2, 3, 5))
union = set()
for el in a:
    union.add(el)

for el in b:
    union.add(el)

Add checks for a collision, which is O(1), and then adds the element, which is (??). This is done n times (where n is |a| + |b|). So this is O(n * x) where x is avg running time for the add operation.

Is this correct?

+4  A: 

The complexity of add/find(collision), would depend on the implementation of union.

If you are using some hashtable based datastructure then your collision operation will indeed be constant assuming a good hash function.

Otherwise, add will probably be O(Log(N)) for a sorted list/tree datastructure.

Akusete
+1  A: 

First answer: If you are dealing with sets of numbers, you could implement a set as a sorted vector of distinct elements. Then you could implement union(S1, S2) simply as a merge operation (checking for duplicates), which takes O(n) time, where n = sum of cardinalities.

Now, my first answer is a bit naive. And Akusete is right: You can, and you should, implement a set as a hash table (a set should be a generic container, and not all objects can be sorted!). Then, both search and insertion are O(1) and, as you guessed, the union takes O(n) time.

(Looking at your Python code) Python sets are implemented with hash tables. Read through this interesting thread. See also this implementation which uses sorted vectors instead.

Federico Ramponi
+2  A: 

This is very much implementation dependent. Others have mentioned sets based on comparables (have a less-than for sorting) or hashables (have a good hash function for hashing). Another possible implementation involved "union-find", which only supports a specialized subset of usual set operations but is very fast (union is amortized constant time, I think?), you can read about it here

http://en.wikipedia.org/wiki/Union_find

and see an example application here

http://lorgonblog.spaces.live.com/blog/cns!701679AD17B6D310!220.entry

Brian
A: 

If you can use bitsets (each bit in an array of int equals an item of your set), you can simply walk over the int array and OR the elements. This has the complexity O(N) (where N is the length of the array) or O((m+31)/32) where M is the number of items.

Aaron Digulla