From Wikipedia:
The main disadvantages are greater overall space usage and slower indexing, both of which become more severe as the tree structure becomes larger and deeper. However, many practical applications of indexing involve only iteration over the string, which remains fast as long as the leaf nodes are large enough to benefit from cache effects.
I'm implementing a sort of compromise between ropes and strings. Basically it's just ropes, except that I'm flattening concatenation objects into strings when the concatenated strings are short. There are a few reasons for this:
- The benefits of concatenation objects are minimal when the concatenated strings are short (it doesn't take too long to concatenate two strings in their normal form).
- Doing this reduces the largeness/depth of the tree (reducing the downsides of ropes).
- Doing this increases the size of the leaf nodes (to take better advantage of cache).
However, as length gets longer, the advantages of the ropes also decrease, so I'd like to find some compromise. The "sweet spot" logically seems to be around where "the leaf nodes are large enough to benefit from cache effects". The problem is, I don't know how large that is.
EDIT: While I was writing this, it occurred to me that the ideal size would be the size of a cache page, because then the rope only causes cache misses when they would happen anyway in a string. So my second question is, is this reasoning correct? And is there a cross-platform way to detect the size of a cache page?
My target language is C++.