views:

303

answers:

7

Does anyone know of a List implementation that has a constant time get(int index) (I.e. implements RandomAccess) but doesn't have to copy the whole list when it grows as ArrayList does?

I'm thinking the implementation may well be in terms of other lists e.g.

public class ChunkedList<T> implements List<T>, RandomAccess {
  private LinkedList<ArrayList<T>> chunks;
  public T get(int index) {
    return findCorrectChunk(index).get(computeChunkIndex(index));
  }
}
A: 

Well, not really an ideal solution, but you can use TreeMap for that. Your ChunkedList will be a warapper around that. Your keys in TreeMap will be of type Integer or Long and will hold your list indices. Access and insert times will be o(log(n)) (not a constant, but much better than n). And internally TreeMap works similarly to LinkedList i.e. nodes are just linked with references.

EDIT: Something like that:

public class ChunkedList<T> implements List<T>, RandomAccess {

    private TreeMap<Integer, T> data = new TreeMap<Integer, T>();

    public T get(int index) {
     return data.get(index);
    }

    public boolean add(T o) {
     data.put(data.size() + 1, o);
     return true;
    }

       // Other operations

}

Of course, other operations will be a little bit more complex and will take longer time than in ArrayList.

Superfilin
+2  A: 

If there was such a structure, everyone would use it instead of arrays.

However, I reckon a closer structure that I've been told of in a university lecture. It has a constant access time and the time to add/remove an element to an arbitrary position is mostly O(sqrt(N)) and only when N crosses square of integer value, it takes O(N). Amortized time is O(sqrt(N)). Here's the idea.

N items in this structure are stored in a contiguous array, which is divided in sqrt(N) chunks of sqrt(N) contiguous elements (perhaps, the last chunk contains less elements). Every chunk is a ring buffer, for which the position of the first element is stored in a separate array of sqrt(N). To access an element, you should determine what chunk it's in (takes one division), and do a proper shift within the ring buffer (sum and modulus). This is a constant time to access.

To add an element before i-th position, determine the chunk k the element will end up in, then mark all last elements in each chunk in k..sqrt(N)-1 range. Shift marked element in pre-last chunk to the free slot in a last chunk that will be the head of a ring buffer there (access additional array to determine where exactly). Then to the position of the moved element from the pre-last chunk move the marked element from pre-pre-last chunk. Repeat this and you'll get a free slot in the middle of array to place the element you were going to add.

The magic is that you should only increase values in additional array by one (takes O(sqrt(N)) time), thus making the structure consistent to access again. The magic of sqrt(N) is also here: you should operate on each of X chunks and on each of N/X elements of an auxilliary array. min(X + N/X) is reached for X = sqrt(N).

If there's no place in last chunk to add one more element (i.e. the sqrt(N) used so far is too small), repack the array with sqrt(N) increased by one. This takes O(N) time. Amortized time is still O(sqrt(N)) per element.

Therefore, adding an element in an arbitrary place of array takes O(sqrt(N)). Deletion takes the same time. Access time takes O(1).

That's the idea. I don't know how it's called, and professor didn't know either because he invented it on his own. Any reference would be appreciated. And the OP could implement it, but, I bet, someone already has.

Pavel Shved
This is certainly interesting, but you would have to know sqrt(N) ahead of time - although, you could obviously make a good guess and then stick with it.
daveb
No, you actually don't. And I mentioned it: you just use current `sqrt(N)` and rebuild the whole structure when integer part of `sqrt(N)` changes. Repacking takes O(N), like insertion to an usual array. But it should not occur frequently and average cost (if array only grows) is O(sqrt(N)) anyway.
Pavel Shved
Am I missing something? This solution has O(sqrt(N)) insert time, and O(N) rebuild time and rebuild occurs no more than sqrt(N) times. Is this better than the ArrayList where we have O(1) insert time, O(N) rebuild time and rebuild occurs no more than log(N) times.
Buhb
ArrayList provides amortized constant time only for *appending elements *to the end* of the list*. The structure I described takes O(sqrt(N)) to add element to an arbitrary place.
Pavel Shved
+1  A: 

You can, of course, write a list implementation as an array of arrays. There are many choices as to the exact algorithm. The performance is theoretically constant (ignoring cache effects, etc).

In practice there isn't a great deal of point for most situations. There are rope implementations (strings formed as an array of segments), however these are relatively rare. The copy isn't really that expensive and for appends it is amortised over many operations so as to disappear.

(BTW, in the question example code the LinkedList is out of place, as it almost always is.)

Tom Hawtin - tackline
A: 

Have you looked at just hinting the maximum size to the ArrayList constructor?

Thorbjørn Ravn Andersen
A: 

Have a look at Random Access Lists. You can get O(1) insertion at both ends and O(log(n)) access to elements.

In the end, some sort of tree-ish structure should give the best lookup/insertion times.

MISSINGNO
A: 

This sounds like premature optimization. Have you profiled the add() function and shown it to be slow? Because ArrayList doubles the size of the underlying array every time it runs out of space, so you don't have to copy the list every time you append.

You are probably trying to solve a problem that doesn't exist.

David Crawshaw
A: 

It's impossible to write such a data structure. The closest you can get is to pre-size the ArrayList to the maximum size assuming you know the max. What's interesting is that algorithms such as Collections.sort() will perform worse on ChunkedList if its marked RandomAccess.

Craig P. Motlin