ansaurus

Question

I need a container that supports efficient random access and O(k) insertion and removal

Answer 1

+3 A:

How about a hashmap?

It's got amortized O(1) access, insertion and removal average case; O(n) worst case.

NullUserException 2010-09-28 01:43:10

Note that amortized O(1) is the *best* case.

Billy ONeal 2010-09-28 01:59:34

Using the simple work-around mentioned at the question we obtain `O(1)` worst-case access time and `O(1)` amortized time insertion/removal (`k` is the size of every "small" array and we always add/remove a bunch of `O(k)` elements). Furthemore what @Billy ONeal said.

myle 2010-09-28 02:03:13

Answer 2

+1 A:

From the description you gave of the tree structure you are implementing, it might be best to create a new data structure to mimic your tree. Especially if you're already tracking pointers between nodes.

If I understand your statement, each node in your tree would contain a vector of children node pointers. When you need to split the node, you could create new nodes with each one receiving a segment of the vector of children pointers, and the newly created nodes would be inserted into the node vector of the parent node.

for example:

N1->N2->(n3,n4,n5,n6,n7,n8) split N2 into two nodes: N1->(N2_1, N2_2) with N2_1->(n3,n4,n5) and N2_2->(n6,n7,n8)

(Sorry, I don't know how to easily draw trees...)

This way, you are only relinking memory rather than copying, and access will generally be log n. Furthermore, this gives a proper representation of the tree structure in code.

edit adding example:

Suppose again, we have N1->N2->(n3,n4,n5,n6,n7,n8). If N1 should need to have new nodes added, the only affect is on the N1 node: N1->(N2,N9)->(n3,n4,n5,n6,n7,n8)

The structure of a node might be like this (very simplified):

class Node {
  vector<Node *> children;
  Node * parent;
};

The larger tree structure would be many of these nodes all linked together much like a binary tree. To add nodes to any one node on a tree would only add items to the children member of that node. Nothing else is affected.

JoshD 2010-09-28 02:02:30

You are right. As far as splitting is concerned, only relinking takes place. But the number of children of a particular node can grow due to external reasons. That is where the problem arises. How we do it efficiently without affecting all the nodes of the structure?

myle 2010-09-28 02:09:27

As I said, the nodes would have `vectors<Node *>`. In that way, you can add new nodes to any given node without affecting anything else in the whole tree structure. As an example, in my answer node N1 could have five new nodes added to it, but it would only alter the vector of noes in N1. All other data in the tree (N2-n8) aren't affected. Does that work?

JoshD 2010-09-28 02:11:19

How do you split efficiently the `vector` and then add new elements in its parts?

myle 2010-09-28 02:17:04

If you wanted to split a node into three nodes (for example), you would create three new nodes, then assign the values from the old node into the new nodes. The first node would get the first third of the entries, the second node, the middle third, and so on. The complexity of this split would increase in proportion the the number of values in the vector. So if you split a node with k children in a tree with n nodes, it's O(k).

JoshD 2010-09-28 02:21:45

I suppose I've misunderstood the question a bit. It seems you've already considered this option and found it unsuitable. I hope you find what you're searching for. Good luck!

JoshD 2010-09-28 02:26:34

Thanks for your insightful comments. We suggest the same work-around here. At every split, delete the "old" `vector<Node *>` and build "smaller new" ones.

myle 2010-09-28 09:57:16

@myle: To avoid complete reallocation of the vector, you could use a list and the splice function. That has move semantics, so you don't have to copy or reallocate the contents; it can be moved. That would give somewhat better performance for the split operation.

JoshD 2010-09-28 16:36:15

Answer 3

+1 A:

It looks to me like you're trying to reinvent the B+ Tree...

There is a variation of the B+ Tree which consists in storing the number of sub elements instead of a proper key, this allows efficient random access (log n) which factor you actually control.

For example, a common factor for databases is around 1,000, meaning that at the first level of the tree is the root, the second can contain up to 1,000 children, the third up to 1,000,000 etc...

If you have less than 1,000,000,000 objects, it means at most 3 dereferences, which is quite efficient.

There has been a python module written using this technic, called blist, which aimed at replacing the traditional list class (implemented as a C++ vector) with a B+ Tree to get more efficiency for large items. The performances measurements can be found here.

Matthieu M. 2010-09-28 06:53:27

The difference with the `B+ Tree` is that a node may have an arbitrary number of children (instead of `b/2` < children < `b`) which changes (a lot) the procedures of insertion/deletion and the particular characteristics of the data structure.

myle 2010-09-28 10:06:41

@myle: Yes I spotted that, but there is a reason for the B+ Tree to do so --> avoid (costly) reallocation. The objects are still moved around though.

Matthieu M. 2010-09-28 13:45:41

ansaurus

tags:

views:

answers:

I need a container that supports efficient random access and O(k) insertion and removal

related questions