Efficient data structure for fast random access, search, insertion and deletion

views:

1655

answers:

+1 Q:

Efficient data structure for fast random access, search, insertion and deletion

I'm looking for a data structure (or structures) that would allow me keep me an ordered list of integers, no duplicates, with indexes and values in the same range.

I need four main operations to be efficient, in rough order of importance:

taking the value from a given index
finding the index of a given value
inserting a value at a given index
deleting a value at a given index

Using an array I have 1 at O(1), but 2 is O(N) and insertion and deletions are expensive (O(N) as well, I believe).

A Linked List has O(1) insertion and deletion (once you have the node), but 1 and 2 are O(N) thus negating the gains.

I tried keeping two arrays a[index]=value and b[value]=index, which turn 1 and 2 into O(1) but turn 3 and 4 into even more costly operations.

Is there a data structure better suited for this?

+2 A:

How about using a sorted array with binary search?

Insertion and deletion is slow. but given the fact that the data are plain integers could be optimized with calls to memcpy() if you are using C or C++. If you know the maximum size of the array, you can even avoid any memory allocations during the usage of the array, as you can preallocate it to the maximum size.

The "best" approach depends on how many items you need to store and how often you will need to insert/delete compared to finding. If you rarely insert or delete a sorted array with O(1) access to the values is certainly better, but if you insert and delete things frequently a binary tree can be better than the array. For a small enough n the array most likely beats the tree in any case.

If storage size is of concern, the array is better than the trees, too. Trees also need to allocate memory for every item they store and the overhead of the memory allocation can be significant as you only store small values (integers).

You may want to profile what is faster, the copying of the integers if you insert/delete from the sorted array or the tree with it's memory (de)allocations.

lothar 2009-05-20 21:32:28

insertion in that case is horrible...

McWafflestix 2009-05-20 21:35:44

insertion and deletion were last on the OP's list, and being integers can be optimized with calls to memcpy().

lothar 2009-05-20 21:37:51

The "ordered" part is important, so I can't sort the data.

Leonel 2009-05-21 16:42:14

@Leonel ordered means sorted according to the ordering rules you specify

lothar 2009-05-21 16:45:41

Howabout a Treemap? log(n) for the operations described.

CookieOfFortune 2009-05-20 21:35:25

+1 A:

I like balanced binary trees a lot. They are sometimes slower than hash tables or other structures, but they are much more predictable; they are generally O(log n) for all operations. I would suggest using a Red-black tree or an AVL tree.

Zifre 2009-05-20 21:36:00

hash table wouldn't keep the data ordered.

Greg Rogers 2009-05-20 21:52:40

Oops, I didn't see the ordered part... I fixed it though.

Zifre 2009-05-20 21:55:20

+6 A:

I would use a red-black tree to map keys to values. This gives you O(log(n)) for 1, 3, 4. It also maintains the keys in sorted order.

For 2, I would use a hash table to map values to keys, which gives you O(1) performance. It also adds O(1) overhead for keeping the hash table updated when adding and deleting keys in the red-black tree.

Ayman Hourieh 2009-05-20 21:39:02

in fact, the best self balanced trees (like red-black trees) have an amortized O(1) acess. (yeah, amortization can be counterintuitive)

Javier 2009-05-20 22:27:34

i knew i read that somewhere: http://www.cs.tau.ac.il/~wein/publications/pdfs/rb_tree.pdf

Javier 2009-05-20 22:33:00

Sounds good. I'll try it out

Leonel 2009-05-21 16:42:33

@Javier: Red-black trees absolutely do not have amortized O(1) access. Red-black trees do not actually *do* anything when you read an element in the tree, so there's no amortization.No binary tree, dynamic or no, can achieve o(n log n) for accessing n arbitrary elements in the tree.

Captain Segfault 2009-05-21 23:11:50

+1 A:

I don't know what language you're using, but if it's Java you can leverage LinkedHashMap or a similar Collection. It's got all of the benefits of a List and a Map, provides constant time for most operations, and has the memory footprint of an elephant. :)

If you're not using Java, the idea of a LinkedHashMap is probably still suitable for a usable data structure for your problem.

Rob Hruska 2009-05-20 21:43:35

If you're working in .NET, then according to the MS docs http://msdn.microsoft.com/en-us/library/f7fta44c.aspx

SortedDictionary and SortedList both have O(log n) for retrieval
SortedDictionary has O(log n) for insert and delete operations, whereas SortedList has O(n).

The two differ by memory usage and speed of insertion/removal. SortedList uses less memory than SortedDictionary. If the SortedList is populated all at once from sorted data, it's faster than SortedDictionary. So it depends on the situation as to which is really the best for you.

Also, your argument for the Linked List is not really valid as it might be O(1) for the insert, but you have to traverse the list to find the insertion point, so it's really not.

BenAlabaster 2009-05-20 21:49:07

+1 A:

How to achieve 2 with RB-trees? We can make them count their children with every insert/delete operations. This doesn't make these operationis last significantly longer. Then getting down the tree to find the i-th element is possible in log n time. But I see no implementation of this method in java nor stl.

Jarek Czekalski 2010-04-28 06:50:54

ansaurus

tags:

views:

answers:

Efficient data structure for fast random access, search, insertion and deletion

related questions