views:

48

answers:

2

I understand what the expression cache oblivious means. But I was wondering if there is any easy explanation for how data structures can be designed that can use the cache optimally, without knowing the sizes of the cache.

Can you please provide such an explanation, preferably with an (easy) example?

+1  A: 

The primary intuition is that if you recursively split the dataset you work with, at some point (usually pretty quickly) you'll reach a size that 1) fits in the cache, and 2) fills at least half the cache (assuming each split of the dataset is (at least approximately) in half).

Jerry Coffin
+2  A: 

Even an algorithm as familiar as quicksort is somewhat cache oblivious (but not optimal). Recall that it works by partitioning the array, then recursing on each side of the partition. Eventually, it is operating on a sub-array which fits in cache, and so there will be no more cache misses until it finishes that sub-array and moves on to another one. That's the property we're looking for.

Contrast this with insertion sort, which (to use a technical term) leaps all over the place all the time. So quite aside from insertion sort's need to move O(n^2) items around, it also misses cache a lot when used on large arrays.

Quicksort is some way from optimal, though. Each individual partition phase doesn't divide and recurse - it does a long sequential run through memory churning the cache. Potentially this will happen several times before the sub-array size is small enough that we start winning, so we're not minimising the number of cache misses.

Steve Jessop
Thanks a lot. Explaining quicksort from a cache-oblivious point of view was very illuminating.
Muhammad Alkarouri