views:

362

answers:

8

My gut tells me there is no good way to achieve this, but, unlike Mr. Stephen Colbert, I'd rather trust a community of developers than my gut...

Is there a known way to efficiently implement a "best of both worlds" list, one that provides random access by index and O(1) insertion/removal like a linked list?

I foresee two possible outcomes: either "No, this is impossible, for the following obvious reasons..." or "Uh, yes, this has been done; see here, here, and here."

+4  A: 

I don't believe it will be possible to get O(1) for both insertion and lookup. The minute you add an array (or even fancy, splittable vectors), the insertion becomes O(n).

There are ways to mitigate the damage depending on the expected behavior of your list. If there will be a lot more lookups than insertions/deletions, it may be better to just use vectors (variable-sized arrays) - these are reasonably efficient, not quite like arrays, but better than traversing lists (since these tend to be lists of arrays, it's still technically traversing a list, but each element in the list typically has its size, which makes it more efficient).

If insertions and deletions are more frequent, you can make the index build a lazy one so that it's only done when required. For example, inserts and deletes will only change the linked list portion (and mark the index as dirty) - only when someone tries to use the index will it be rebuilt and marked as clean.

You can even optimize the rebuild by keeping a record of the first dirty entry. This will mean if you only insert or delete in the last half of the list, you don't need to rebuild the entire index when someone wants to use it.

A solution I once implemented was a 2D List. By this, I mean:

        +-----------+    +-----------+    +-----------+
List -> | count = 7 | -> | Count = 4 | -> | Count = 6 | -> null
        +-----------+    +-----------+    +-----------+
              |                |                |
              V                V                V
        +-----------+    +-----------+    +-----------+
        |    [0]    |    |    [7]    |    |   [11]    |
        +-----------+    +-----------+    +-----------+
              |                |                |
              V                V                V
        +-----------+    +-----------+    +-----------+
        |    [1]    |    |    [8]    |    |   [12]    |
        +-----------+    +-----------+    +-----------+
              |                |                |
              :                :                :
              :                :                :
              |                |                |
              V                V                V
        +-----------+    +-----------+    +-----------+
        |    [6]    |    |   [10]    |    |   [16]    |
        +-----------+    +-----------+    +-----------+
              |                |                |
              V                V                V
             null             null             null

While this made both insertion and lookup O(n), the balance was right. In a pure array solution, lookup is O(1) and insertion is O(n). For a pure linked list, insertion is O(1) (once you've found the insertion point, of course, an operation that is itself O(n)) and lookup is O(n).

The 2D list is O(n) for both but with a lower factor. If you're looking to insert, you can find the right column simply by examining the first row of each column. Then you traverse the column itself looking for the right row. Then the item is inserted and the count for that column is increased. Similarly for deletions although in that case the count is decreased, and the entire column is removed when its count reaches zero.

For an index lookup, you traverse the columns to find the correct column, then traverse the items in the column to get the right item.

And, it can even be auto-adjusting by trying to keep the maximum height and width roughly the same.

paxdiablo
A comprehensive answer with a lot of good ideas in it: thanks, I really appreciate it.
Dan Tao
A: 

I don't know the exact BigO on insertion (as that would vary based upon sample size and growth) but Java's java.util.LinkedList would come immediately to mind.

http://java.sun.com/j2se/1.5.0/docs/api/java/util/LinkedList.html

EDIT: yes, apparently underneath it is still a true linked-list and as such indexed gets could be O(n/2) which is of course is formally O(n).

You could always waste a whole bunch of space and implement a List implementation that keeps a parallel linked-list and array with deferred insert/removals.

Xepoch
Ah, Java, eh? I will check that out immediately...
Dan Tao
Blast! Looks like that class provides indexing only via iteration, so random access will be slower than insertion/removal: "Operations that index into the list will traverse the list from the beginning or the end, whichever is closer to the specified index" (from http://www.docjar.com/docs/api/java/util/LinkedList.html).
Dan Tao
From that page: "Operations that index into the list will traverse the list". I gathered the OP was looking for an O(1) index lookup as well.
paxdiablo
+1  A: 

When I was implementing a linked list in class I thought about optimizing access time by storing 3 additional fields: The node in the middle of the list, the index of the most recently accessed node and the most recently accessed node itself.

To retrieve a node by index I would then first look at all available paths to reach the node at the given index and then chose the cheapest way to do it. The ways would simply be:

  1. Going from the start to the desired index
  2. Going from the most recently accessed node to the desired index (forwards)
  3. Going from the most recently accessed node to the desired index (backwards)
  4. Going from the middle node to the desired index (forwards)
  5. Going from the middle node to the desired index (backwards)
  6. Going from the end of the node to the desired index

The path with the smallest difference in our desired index and our starting index would be the cheapest option. If no node has been accessed yet, the recently accessed node could be set to be the middle node. Of course, with an even number of elements there is no actual middle, so I'd just choose the floor of n/2.

Anyway, I never got around to actually implement this optimization or even really analyze it, but I hope I could help.

Marc Müller
The problem with this is that you have to update the midpoint whenever you insert new nodes, and you can't calculate the relative offset of two nodes without traversing the list.
Chris Lutz
Well you're right about the middle node, but we can calculate the relative offset because we actually have all the indexes we need. We have the index of the node we want to access (passed to the function) and we have all starting indexes. 0 for the start of the list, n-1 for the end, n/2 for the middle and the index of the recently accessed node is stored in the list.The relative offset *should* be the difference of the indexes. At least I think so, but I could, of course, be wrong.
Marc Müller
@Marc - Right. Not thinking. This would still be more than O(1) indexing, but it would be better. Though we'd always have to update the indexes (and the midpoint) for every insertion and deletion (and what if we delete the midpoint/most recently indexed node? We have to find another node to replace it), which might get expensive.
Chris Lutz
The middle node could be updated upon insertion or removal by going from the current middle node either backwards or forwards until we've reached n/2 again, which would at most be 1 step, because the size of the list at insertion or removal changes about either -1 or +1.If we delete the recently indexed node the "new" recently indexed node could be set to the middle node. And finding the middle node should always be easy.But, with all that additional steps this could become an unnecessary optimization for lists with a lot of insertions, but it could improve often accessed lists.
Marc Müller
I think what you're describing is the first step towards a skip list.
Moishe
A: 

Java's LinkedList has O(n) access for indexed gets. LinkedList extends AbstractSequentialList to show that it does not offer O(1) indexed gets.

I'd suggest taking a look at Apache's TreeList. It offers O(log n) inserts/removals and O(1) indexed lookups.

brianegge
What is an O(n) "indexed" lookup. Shouldn't that just be O(log n) lookups?
wsorenson
Collections usually allow you to lookup a value based on a key `get(Object)` and/or an index `get(index)`. When referring to indexed gets, I'm talking about the latter where you want the nth item in the list.
brianegge
My point was TreeList doesn't offer O(n) get. That wouldn't make any sense.
wsorenson
+1  A: 

If you think that O(log N) == O(1),
check out:

Nick D
+1 because skip lists are cool.
Chris Lutz
Skip lists are O(log n) for lookup, O(n log n) for insertion/deletion since you have to go back and update all the skip chains when you insert or delete.
paxdiablo
@paxdiablo, I edited by answer. It's O(logn) on insertion/deletion.
Nick D
A: 

Your gut is correct on this.

Linked lists are O(1) insertion/deletion because the operation you perform to insert or remove something is just switching a couple of pointers (one or two on the object you're inserting, and one or two on one or two other objects). This doesn't change by the size of the list, obviously.

A skip list will give you O(logn) lookup, but since you're maintaining an index it also means O(logn) insertion/deletion, because that index needs to be kept up to date.

Any parallel data structure you use for lookup will need to be maintained, so your complexity will scale by that index's complexity.

Do you have a particular problem in mind you're trying to solve?

For instance, you can get O(n) insertion, removal and lookup if you can guarantee a perfect hash. But you need to know some things about your data ahead of time for that to work.

Moishe
A: 

While I dont think you can get integer indexing, a backing hashtable might work if you are using 'reference' types.

leppie
A: 

How about a hash table? You get O(1) random access by key and O(1) insertion/deletion. The catch is that entries are unordered.

For an efficient implementation of ordered sequences, check out finger trees. They give you O(1) access to head and last and O(log n) random access to inner nodes. Insert or delete at either end in O(1). Notably, reversal of a finger tree takes constant time.

Apocalisp