views:

923

answers:

6

Taken from Apache TreeList doc:

The following relative performance statistics are indicative of this class:

             get  add  insert  iterate  remove
 TreeList       3    5       1       2       1
 ArrayList      1    1      40       1      40
 LinkedList  5800    1     350       2     325

It goes on to say, "LinkedList is rarely a good choice of implementation. TreeList is almost always a good replacement for it, although it does use sligtly more memory."

My questions to you -- SO community, are:

  • What is with the ArrayList add, insert, and remove times crushing LinkedList? Should we expect, for one, that real-world insertion and removal cases greatly favor ArrayList?

  • Does this TreeList simply put the nail in the coffin of the venerable LinkedList?

I am tempted to conclude they have amortized or ignored ArrayList's growing pains, and have not taken into consideration the insertion and removal times for an item in a LinkedList that has already been located.

+3  A: 

It's due to the data structures behind these Collections. TreeList is a tree, which allows for relatively fast reads, insertions, removals (all O(log n)). The ArrayList uses an array to store the data, so when you insert or remove, every item in the array has to be shifted up or down (O(n) worst case). Arrays also have a fixed size, so if it overflows the current array's capacity, a new, larger one (usually double the size of the last one, to keep resizes to a minimum) must be created. LinkedList used... a linked list. A linked list usually has a reference to the first (and sometimes last) elements in the list. Then each element in the list has a refrence to either the next element in the list (for a singly-linked list) or the next and previous elements (for a double linked list). Because of this, to access a specific element, you must iterate through every element before it to get there (O(n) worst case). When inserting or removing specific elements, you must find the position to insert or remove them from, which takes time (O(n) worst case). However there is very little cost to simply adding another element to the beginning or end (O(1)).

There are whole books written on data structures and when to use them, I recommend reading up on the more fundamental ones.

Nali4Freedom
+2  A: 

Because a linked list has to navigate node by node to get anywhere in the list (save the front and probably the back depending on implementation) it makes sense that the numbers are so high.

For add/insert/remove in a large LinkedList you would have a lot of hopping from node to node to get to the correct spot.

If they made the ArrayList of the proper size to start with the growing pains will be nothing. If the ArrayList is small the growing pains don't matter.

For the LinkedList if the operations are all near the front of the list it would impact far less then if they are at the end.

What you should do is always use the interface, eg: List when declaring variables and parameters then you can change the "new LinkedList();" to "new ArrayList();" and profile the code to see how it performs in your specific code.

Because of the speed of not having to hop from node to node I always default to ArrayList instead of LinkedList.

I would believe the tree list is going to be significantly faster than both (even without looking at the code). Trees are designed to be fast.

TofuBeer
+1  A: 

For the ArrayList, since it is done infrequently, you can basically have that cost be negligible. If it is actually a problem, then just make the array larger to start with.

If I have a small list then a LinkedList makes sense to use as there is minimal benefit at that point. If the list is going to be long then obviously a TreeList makes more sense.

If I am going to be doing a great deal of random access to a list then the ArrayList makes more sense.

Which container to use really depends on what you will be doing with it. There is no one correct container, as each has their strengths and weaknesses, and with experience you start to get an understanding of when to use which one.

James Black
+9  A: 

The key thing here is the complexity of insert/delete operations in the three List implementations. ArrayList has O(n) insert/delete times for arbitrary indices, but it is O(1) if the operation is at the end of the list. ArrayList also has the convenience of O(1) access for any location. LinkedList is similarly O(n), but is O(1) for operations at either end of the List (start and end) and O(n) access for arbitrary positions. TreeList has O(logn) complexity for all operations at any position.

This clearly shows that TreeList is faster for large enough Lists as far as insert/deletes in arbitrary positions are concerned. But AFAIK, TreeLists are implemented as a binary search tree, and has a much bigger constant associated with its O(logn) operations than similar operations with ArrayLists which are simply wrappers around an array. This makes TreeList actually slower for small Lists. Also, if all you are doing is appending element into a List, the O(1) performance of ArrayList/LinkedList is clearly faster. Moreover, often the number of insert/deletes are much fewer than the numbers of accesses, which tends to make ArrayList faster overall for many cases. LinkedList's constant time insert/delete at the either end of the List makes it much faster at implementing data structures like Queues, Stacks and Deques.

At the end of the day, it all depends on what exactly you need a List for. There isn't a one-size-fits-all solution. You have to select the implementation most suitable for your job.

MAK
one interesting detail here is the physical->virtual memory manager and how memory is paged, etc... Ultimately, a large array list actually *behaves* like a B+Tree with chunks of the array occupying contiguous memory, but multiple chunks being able to be in different physical memory locations (including swap). This is, of course, well outside the scope of what a Java application can worry about, but it's the type of thing that keeps JVM garbage collection designers up at night ;-)
Kevin Day
+1  A: 

Each and every one person who answered here is correct. They all are right in their notion, that it depends very heavily on your usage pattern, i.e there is no one-size-fits-all List. But at the moment of my writing they all forgot to mention (either that, or I am sloppy reader) a use-case when LinkedList is at the best: iterator-positioned insert. That means, that if you're doing not just

LinkedList::add(int index, E element) 
          Inserts the specified element at the specified position in this list.

which seem to be the method they used to obtain the statistics, but

iterator.insert(E element)

with an iterator obtained through either

public abstract ListIterator<E> listIterator(int index)
Returns a list-iterator of the elements in this list (in proper sequence), starting at the specified position in the list.

or

public Iterator<E> iterator()
Returns an iterator over the elements in this list (in proper sequence).

, then you're bound to get the best arbitrary insertion performance ever. That implies of course, that you're able to limit number of calls to iterator() and listIterator(), and number of iterator's movements through the list (e.g you can do only one sequential pass over the list to do all inserts you need). This makes its use-cases quite limited in their number, but nevertheless they are the ones that occur very very often. And LinkedList's performance in them is the reason why it is (and gonna be in the future) being kept in all container collections of all languages, not just Java.

PS. All of the above of course applies to all other operations, like get(), remove(), etc. I.e carefully designed access through iterator will make all of them O(1) with a very small actual constant. The same of course can be said for all other Lists, i.e iterator access will speed them all up (however slightly). But not ArrayList's insert() and remove() - they're still going to be O(n)... And not TreeList's insert() and remove() - tree balancing overhead is not something you can avoid... And TreeList probably has more memory overhead... You get my idea. To sum it all up, LinkedList is for small hi-perf scan-like operations over lists. Whether that is the use-case you need or not - only you can tell.

PSS. That said, I'm therefore also remain

tempted to conclude they have amortized or ignored ArrayList's growing pains, and have not taken into consideration the insertion and removal times for an item in a LinkedList that has already been located.

nightingale
A: 

Note that ArrayList is generally faster than LinkedList, even when your code calls just the methods that are constant time for both. For example, ArrayList.add() simplies copies a single variable and increments a counter when no resizing is needed, while LinkedList.add() must also create a node and set multiple pointers. In addition, the LinkedList nodes require more memory, which slows down your application, and garbage collection must deal with the nodes.

If you need to add or remove elements from either end of the list, but don't require random access, an ArrayDeque is faster than than a LinkedList, though it requires Java 6.

A LinkedList make sense for iterating across the list and then adding or removing elements in the middle, but that's an unusual situation.

Jared Levy