Ropes: what's "large enough to benefit from cache effects"?

From Wikipedia:

The main disadvantages are greater overall space usage and slower indexing, both of which become more severe as the tree structure becomes larger and deeper. However, many practical applications of indexing involve only iteration over the string, which remains fast as long as the leaf nodes are large enough to benefit from cache effects.

I'm implementing a sort of compromise between ropes and strings. Basically it's just ropes, except that I'm flattening concatenation objects into strings when the concatenated strings are short. There are a few reasons for this:

The benefits of concatenation objects are minimal when the concatenated strings are short (it doesn't take too long to concatenate two strings in their normal form).
Doing this reduces the largeness/depth of the tree (reducing the downsides of ropes).
Doing this increases the size of the leaf nodes (to take better advantage of cache).

However, as length gets longer, the advantages of the ropes also decrease, so I'd like to find some compromise. The "sweet spot" logically seems to be around where "the leaf nodes are large enough to benefit from cache effects". The problem is, I don't know how large that is.

EDIT: While I was writing this, it occurred to me that the ideal size would be the size of a cache page, because then the rope only causes cache misses when they would happen anyway in a string. So my second question is, is this reasoning correct? And is there a cross-platform way to detect the size of a cache page?

My target language is C++.

The limit case for a rope-like string would be built on top of a std::list<char>. That obviously isn't very effective. When iterating, you are likely to have have one cache miss per "leaf"/char. As the number of characters per leaf goes up, the average number of misses goes down, with a discontinuity as soon as your leaf allocation exceeds a single cache line.

It might still be a good idea to have larger leafs; memory transfers in cache hierarchies might have different granularities at different levels. Also, when targetting a mixed set of CPUs (i.e. consumer PCs) a leaf size which is a higher power of two will be an integral multiple of the cache line size on more machines. E.g. if you're addressing CPUs with 16 and 32 byte cache lines, 32 bytes would be the better choice, as it's an always integral number of cache lines. Wasting half a cache line is a shame.

Thanks for your answer. One more question: on a system with a hierarchical cache, would it be better to target L1 or L2 cache? My thinking is that in the interest of erring toward larger cache lines as you suggested, that I should target L2 cache. My end solution, I think, will be to choose a sane default (64 or 128 seem to fit the bill based on the Intel and Athlon CPU's I looked at) and let the user override the default through the command line when starting my program.

Imagist 2009-08-26 01:29:51

I'd expect the optimization advantages of a single hard-coded value to outweigh the runtime advantages of choosing the "correct" L2 value.

MSalters 2009-08-26 09:38:41

ansaurus

tags:

views:

answers:

Ropes: what's "large enough to benefit from cache effects"?

related questions