I'm working on some tools to enable high throughput data-oriented development, and one thing that I've not got an immediate answer for is how you go about allocating strings quickly. On risc processors you've got another problem of implementation that the CPU doesn't like branching, which is what I'm trying to minimise or avoid. Also, cache coherence is important on most CPUs, so that's gotta be influential in the design too.
So, how would you go about reducing the overhead for a generic string allocator?
Sometimes it's easier to solve a more explicit problem, so any ideas for string sizes of 5-30?