views:

3990

answers:

7

I recently came across the data structure known as a Skip list. They seem to have very similar behavior to a binary search tree... my question is - why would you ever want to use a skip list over a binary search tree?

A: 

Not much difference but in whatever I've learned till date, Skip List is somewhat easy to implement than binary search tree.

askgelal
easier to implement? How so?
Mitch Wheat
Maybe easier to implement than a balanced tree, but not a binary tree!
Mitch Wheat
I looked into them once and got stuck at the point of needing a portable, decent source of randomness. Eventually I went with AVL Trees which had a decent free implementation.
dajobe
You can implement a binary search tree in a few lines of code (see http://www.cs.cornell.edu/courses/cs3110/2009sp/lectures/lec11.html). Where is a comparably-short implementation of skip lists?
Jon Harrop
@dajobe: Good, portable pseudo-random number generators are readily available, and many only take a half-dozen lines of code. Googling for "George Marsaglia" will turn up a few. If you have cryptographic concerns things get more complicated, but that's overkill for the general case. "True" randomness is also overkill. Besides, for getting a skip list implementation off the ground, even your language/platform's built in RNG should do the trick.
Steve S
+3  A: 

From the wiki you quoted:

Θ(n) operations, which force us to visit every node in ascending order (such as printing the entire list) provide the opportunity to perform a behind-the-scenes derandomization of the level structure of the skip-list in an optimal way, bringing the skip list to O(log n) search time. [...] A skip list, upon which we have not recently performed [any such] Θ(n) operations, does not provide the same absolute worst-case performance guarantees as more traditional balanced tree data structures, because it is always possible (though with very low probability) that the coin-flips used to build the skip list will produce a badly balanced structure

EDIT: so it's a trade-off: Skip Lists use less memory at the risk that they might degenerate into an unbalanced tree.

Mitch Wheat
this would be a reason against using the skip list.
Claudiu
@Mitch: Stop directly copying from wikipedia http://en.wikipedia.org/wiki/Skip_list "....which we have not recently performed either of the above mentioned..." Which above?
askgelal
@askgelal: I was assuming you had actually read the article you in fact quoted. I think you meant to say "Please stop directly..."
Mitch Wheat
quoting MSDN, "The chances [for 100 level 1 elements] are precisely 1 in 1,267,650,600,228,229,401,496,703,205,376".
peterchen
Why would you say that they use less memory?
Jonathan
@peterchen: Sure but which case is pathological?
Jon Harrop
@Jon: Skip list heights are randomly chosen, so there is a finite probability all end up as 1.
peterchen
@peterchen: I see, thanks. So this does not occur with deterministic skip lists?@Mitch: "Skip Lists use less memory". How do skip lists use less memory than balanced binary trees? Looks like they've got 4 pointers in every node and duplicate nodes whereas trees have only 2 pointers and no duplicates.
Jon Harrop
@Jon Harrop: The nodes at level one only need one pointer per node. Any nodes at higher levels need only two pointers per node (One to the next node and one to the level below it), though of course a level 3 node means you are using 5 pointers total for that one value. Of course, this will still suck up a lot of memory (moreso than a binary search if you want a non-useless skip list and have a large dataset)...but I think I'm missing something...
Brian
+2  A: 

You might want to look at splay trees too. They are also quite easy to implement and tend toward balance.

I would try to avoid randomized approximation algorithms (e.g., skip lists) if you're going to write unit tests for the data structure.

Cybis
If you're abandoning a perfectly good solution because it might be hard to write unit tests for it, you're doing it wrong. Unit tests are supposed to serve your application and not the other way round.
Seun Osewa
+7  A: 

Also, in addition to the answers given (ease of implementation combined with comparable performance to a balanced tree). I find that implementing in-order traversal (forwards and backwards) is far simpler because a skip-list effectively has a linked list inside its implementation.

Evan Teran
isn't in-order traversal for a bin tree as simple as: "def func(node): func(left(node)); op(node); func(right(node))"?
Claudiu
Sure, that true if you want to traverse all in one function call. but it gets much more annoying if you want to have iterator style traversal like in std::map.
Evan Teran
@Evan :Not in a functional language where you can just write in CPS.
Jon Harrop
@Evan: `def iterate(node): for child in iterate(left(node)): yield child; yield node; for child in iterate(right(node)): yield child;`? =). non-local control iz awesom.. @Jon: writing in CPS is a pain, but maybe you mean with continuations? generators are basically a special case of continuations for python.
Claudiu
@Claudiu: is that roughly equivalent to `it++`? If so, that's terribly inefficient.
Evan Teran
@Evan: no, that's the code to set up the iteration. the equivalent of `it++` would be calling `.next()` on the resulting generator. to actually iterate you'd do: `for child in iterate(head): op(child)`.
Claudiu
@Claudui: but what if you simply have an instance of an iterator and just want the next one? Even if the map has changed since you got that iterator (which is ok for many usages of `std::map`) does that still work?
Evan Teran
+15  A: 

Skip lists are more amenable to concurrent access/modification. Herb Sutter wrote an article about data structure in concurrent environments. It has more indepth information.

The most frequently used implementation of a binary search tree is a red-black tree. The concurrent problems come in when the tree is modified it often needs to rebalance. The rebalance operation can affect large portions of the tree, which would require a mutex lock on many of the tree nodes. Inserting a node into a skip list is far more localized, only nodes directly linked to the affected node need to be locked.


Update from Jon Harrops comments

I read Fraser and Harris's latest paper Concurrent programming without locks. Really good stuff if you're interested in lock-free data structures. The paper focuses on Transactional Memory and a theoretical operation multiword-compare-and-swap MCAS. Both of these are simulated in software as no hardware supports them yet. I'm fairly impressed that they were able to build MCAS in software at all.

I didn't find the transactional memory stuff particularly compelling as it requires a garbage collector. Also software transactional memory is plagued with performance issues. However, I'd be very excited if hardware transactional memory ever becomes common. In the end it's still research and won't be of use for production code for another decade or so.

In section 8.2 they compare the performance of several concurrent tree implementations. I'll summarize their findings. It's worth it to download the pdf as it has some very informative graphs on pages 50, 53, and 54.

  • Locking skip lists are insanely fast. They scale incredibly well with the number of concurrent accesses. This is what makes skip lists special, other lock based data structures tend to croak under pressure.
  • Lock-free skip lists are consistently faster than locking skip lists but only barely.
  • transactional skip lists are consistently 2-3 times slower than the locking and non-locking versions.
  • locking red-black trees croak under concurrent access. Their performance degrades linearly with each new concurrent user. Of the two known locking red-black tree implementations, one essentially has a global lock during tree rebalancing. The other uses fancy (and complicated) lock escalation but still doesn't significantly out perform the global lock version.
  • lock-free red-black trees don't exist.
  • transactional red-black trees are comparable with transactional skip-lists. That was very surprising and very promising. Transactional memory, though slower if far easier to write. It can be as easy as quick search and replace on the non-concurrent version.
caspin
Not to mention that in a non-degenerate skiplist, about 50% of the nodes should only have a single link which makes insert and delete remarkably efficient.
Adisak
Rebalancing does not require a mutex lock. See http://www.cl.cam.ac.uk/research/srg/netos/lock-free/
Jon Harrop
@Jon, yes and no. There are no known lock-free red-black tree implementations. Fraser and Harris show how a transactional memory based red-black tree is implemented and its performance. Transactional memory is still very much in the research arena, so in production code, a red-black tree will still need to lock large portions of the tree.
caspin
+2  A: 

In practice I've found that B-tree performance on my projects has worked out to be better than skip-lists. Skip lists do seem easier to understand but implementing a B-tree is not that hard.

The one advantage that I know of is that some clever people have worked out how to implement a lock-free concurrent skip list that only uses atomic operations. For example, Java 6 contains the ConcurrentSkipListMap class, and you can read the source code to it if you are crazy.

But it's not too hard to write a concurrent B-tree variant either - I've seen it done by someone else - if you preemptively split and merge nodes "just in case" as you walk down the tree then you won't have to worry about deadlocks and only ever need to hold a lock on two levels of the tree at a time. The synchronization overhead will be a bit higher but the B-tree is probably faster.

Jonathan
+2  A: 

Skip lists are implemented using lists.

Lock free solutions exist for singly and doubly linked lists - but there are no lock free solutions which directly using only CAS for any O(logn) data structure.

You can however use CAS based lists to create skip lists.

(Note that MCAS, which is created using CAS, permits arbitrary data structures and a proof of concept red-black tree had been created using MCAS).

So, odd as they are, they turn out to be very useful :-)

"there are no lock free solutions which directly using only CAS for any O(logn) data structure". Not true. For counter examples see http://www.cl.cam.ac.uk/research/srg/netos/lock-free/
Jon Harrop