views:

1655

answers:

7

I'm looking for a data structure (or structures) that would allow me keep me an ordered list of integers, no duplicates, with indexes and values in the same range.

I need four main operations to be efficient, in rough order of importance:

  1. taking the value from a given index
  2. finding the index of a given value
  3. inserting a value at a given index
  4. deleting a value at a given index

Using an array I have 1 at O(1), but 2 is O(N) and insertion and deletions are expensive (O(N) as well, I believe).

A Linked List has O(1) insertion and deletion (once you have the node), but 1 and 2 are O(N) thus negating the gains.

I tried keeping two arrays a[index]=value and b[value]=index, which turn 1 and 2 into O(1) but turn 3 and 4 into even more costly operations.

Is there a data structure better suited for this?

+2  A: 

How about using a sorted array with binary search?

Insertion and deletion is slow. but given the fact that the data are plain integers could be optimized with calls to memcpy() if you are using C or C++. If you know the maximum size of the array, you can even avoid any memory allocations during the usage of the array, as you can preallocate it to the maximum size.

The "best" approach depends on how many items you need to store and how often you will need to insert/delete compared to finding. If you rarely insert or delete a sorted array with O(1) access to the values is certainly better, but if you insert and delete things frequently a binary tree can be better than the array. For a small enough n the array most likely beats the tree in any case.

If storage size is of concern, the array is better than the trees, too. Trees also need to allocate memory for every item they store and the overhead of the memory allocation can be significant as you only store small values (integers).

You may want to profile what is faster, the copying of the integers if you insert/delete from the sorted array or the tree with it's memory (de)allocations.

lothar
insertion in that case is horrible...
McWafflestix
insertion and deletion were last on the OP's list, and being integers can be optimized with calls to memcpy().
lothar
The "ordered" part is important, so I can't sort the data.
Leonel
@Leonel ordered means sorted according to the ordering rules you specify
lothar
A: 

Howabout a Treemap? log(n) for the operations described.

CookieOfFortune
+1  A: 

I like balanced binary trees a lot. They are sometimes slower than hash tables or other structures, but they are much more predictable; they are generally O(log n) for all operations. I would suggest using a Red-black tree or an AVL tree.

Zifre
hash table wouldn't keep the data ordered.
Greg Rogers
Oops, I didn't see the ordered part... I fixed it though.
Zifre
+6  A: 

I would use a red-black tree to map keys to values. This gives you O(log(n)) for 1, 3, 4. It also maintains the keys in sorted order.

For 2, I would use a hash table to map values to keys, which gives you O(1) performance. It also adds O(1) overhead for keeping the hash table updated when adding and deleting keys in the red-black tree.

Ayman Hourieh
in fact, the best self balanced trees (like red-black trees) have an amortized O(1) acess. (yeah, amortization can be counterintuitive)
Javier
i knew i read that somewhere: http://www.cs.tau.ac.il/~wein/publications/pdfs/rb_tree.pdf
Javier
Sounds good. I'll try it out
Leonel
@Javier: Red-black trees absolutely do not have amortized O(1) access. Red-black trees do not actually *do* anything when you read an element in the tree, so there's no amortization.No binary tree, dynamic or no, can achieve o(n log n) for accessing n arbitrary elements in the tree.
Captain Segfault
+1  A: 

I don't know what language you're using, but if it's Java you can leverage LinkedHashMap or a similar Collection. It's got all of the benefits of a List and a Map, provides constant time for most operations, and has the memory footprint of an elephant. :)

If you're not using Java, the idea of a LinkedHashMap is probably still suitable for a usable data structure for your problem.

Rob Hruska
A: 

If you're working in .NET, then according to the MS docs http://msdn.microsoft.com/en-us/library/f7fta44c.aspx

  • SortedDictionary and SortedList both have O(log n) for retrieval
  • SortedDictionary has O(log n) for insert and delete operations, whereas SortedList has O(n).

The two differ by memory usage and speed of insertion/removal. SortedList uses less memory than SortedDictionary. If the SortedList is populated all at once from sorted data, it's faster than SortedDictionary. So it depends on the situation as to which is really the best for you.

Also, your argument for the Linked List is not really valid as it might be O(1) for the insert, but you have to traverse the list to find the insertion point, so it's really not.

BenAlabaster
+1  A: 

How to achieve 2 with RB-trees? We can make them count their children with every insert/delete operations. This doesn't make these operationis last significantly longer. Then getting down the tree to find the i-th element is possible in log n time. But I see no implementation of this method in java nor stl.

Jarek Czekalski