views:

178

answers:

4

I would like to know how a set is implemented in C++. If I were to implement my own set container without using the STL provided container, what would be the best way to go about this task?

I understand STL sets are based on the abstract data structure of a binary search tree. So what is the underlying data structure? An array?

Also, how does insert() work for a set? How does the set check whether an element already exists in it?

I read on wikipedia that another way to implement a set is with a hash table. How would this work?

+3  A: 

You could implement a binary search tree by first defining a Node struct:

struct Node
{
  void *nodeData;
  Node *leftChild;
  Node *rightChild;
}

Then, you could define a root of the tree with another Node *rootNode;

The Wikipedia entry on Binary Search Tree has a pretty good example of how to implement an insert method, so I would also recommend checking that out.

In terms of duplicates, they are generally not allowed in sets, so you could either just discard that input, throw an exception, etc, depending on your specification.

Raul Agrait
+3  A: 

How a particular container is implemented in C++ is entirely implementation dependent. All that is required is for the result to meet the requirements set out in the standard, such as complexity requirement for the various methods, iterators requirements etc.

KTC
That said, most (all?) are red-black trees.
GMan
There's an AVL based implementation at http://sourceforge.net/projects/stlavlmap/ and an incomplete BTree based implementation at http://idlebox.net/2007/stx-btree/
MSalters
Given the time-complexity constrains and the requirement that it has to be sorted, there's not much room for alternatives that won't involve some kind of balanced tree.
Fabio Ceconello
+2  A: 

As KTC said, how std::set is implemented can vary -- the C++ standard simply specifies an abstract data type. In other words, the standard does not specify how a container should be implemented, just what operations it is required to support. However, most implementations of the STL do, as far as I am aware, use red-black trees or other balanced binary search trees of some kind (GNU libstdc++, for instance, uses red-black trees).

While you could theoretically implement a set as a hash table and get faster asymptotic performance (amortized O(1) versus O(log n) for lookup and insert), that would require having the user supply a hash function for whatever type they wanted to store (see Wikipedia's entry on hash tables for a good explanation of how they work). As for an implementation of a binary search tree, you wouldn't want to use an array -- as Raul mentioned, you would want some kind of Node data structure.

Toli
You can implement *a* set type with a hash table. However, you can't (at least not easily) implement one that meets the requirements for std::set iterators.
dan04
@dan04: you can't even implement one that satisfies the requirements for std::set lookup and insert. As Toli says, you need a hash function on the key type, and the user of `std::set` isn't required by the standard to provide one. So when their code fails to compile with `no match found for hash(MyElement)`, that means the std::set implementation is defective.
Steve Jessop
+2  A: 

I understand STL sets are based on the abstract data structure of a binary search tree. So what is the underlying data structure? An array?

As others have pointed out, it varies. A set is commonly implemented as a tree (red-black tree, balanced tree, etc.) though but there may be other implementations that exist.

Also, how does insert() work for a set?

It depends on the underlying implementation of your set. If it is implemented as a binary tree, Wikipedia has a sample recursive implementation for the insert() function. You may want to check it out.

How does the set check whether an element already exists in it?

If it is implemented as a tree, then it traverses the tree and check each element. However, sets do not allow duplicate elements to be stored though. If you want a set that allows duplicate elements, then you need a multiset.

I read on wikipedia that another way to implement a set is with a hash table. How would this work?

You may be referring to a hash_set, where the set is implemented using hash tables. You'll need to provide a hash function to know which location to store your element. This implementation is ideal when you want to be able to search for an element quickly. However, if it is important for your elements to be stored in particular order, then the tree implementation is more appropriate since you can traverse it preorder, inorder or postorder.

jasonline