views:

233

answers:

5

I want to implement a Set in C. Is it OK to use a linked list, when creating the SET, or should I use another approach ?

How do you usually implement your own set (if needed).

NOTE: If I use the Linked List approach, I will probably have the following complexities for Set my operations:

  • init : O(1);
  • destroy: O(n);
  • insert: O(n);
  • remove: O(n);
  • union: O(n*m);
  • intersection: O(n*m);
  • difference: O(n*m);
  • ismember: O(n);
  • issubset: O(n*m);
  • setisequal: O(n*m);

O(n*m) seems may be a little to big especially for huge data... Is there a way to implement my Set more efficient ?

+4  A: 

std::set is often implemented as a red black tree: http://en.wikipedia.org/wiki/Red-black_tree

This approach will give you much better complexity on all the listed operations.

Andreas Brinck
+2  A: 

There are lot of ways for set implementation. Here are some of them. Besides MSDN have very good article on it.

malay
Thanks for mentioning the MSDN article, very intersting article.
Andrei Ciobanu
+1  A: 

I have used Red-Black trees in the past to build sets.

edit: A little slow on the response.

Here are the time complexities from the Wikipedia article.

Space O(n)
Search O(log n)
Insert O(log n)
Delete O(log n)

Seth M.
Can you give me some hints regarding O(n) complexity ?
Andrei Ciobanu
posted in the edit
Seth M.
+1  A: 

Since you already have a linked list implemented, the easiest is a skip list. If you want to use balanced trees, the easiest in my opinion is a treap. These are randomized data structures, but generally they are just as efficient as their deterministic counterparts, if not more (and a skip list can be made deterministic).

IVlad
Thanks for for mentioning the skip list (didn't know about it). I will probably use it in another context. (Multumesc!)
Andrei Ciobanu
+5  A: 

Sets are typically implemented either as red-black trees (which requires the elements to have a total order), or as an automatically-resizing hashtable (which requires a hash function).

The latter is typically implemented by having the hashtable double in size and reinserting all elements when a certain capacity threshold (75% works well) is exceeded. This means that inidividual insert operations can be O(n), but when amortized over many operations, it's actually O(1).

Michael Borgwardt