views:

189

answers:

1

Anyone thought about how to write a memory manager (in C++) that is completely branch free? I've written a pool, a stack, a queue, and a linked list (allocating from the pool), but I am wondering how plausible it is to write a branch free general memory manager.

This is all to help make a really reusable framework for doing solid concurrent, in-order CPU, and cache friendly development.

Edit: by branchless I mean without doing direct or indirect function calls, and without using ifs. I've been thinking that I can probably implement something that first changes the requested size to zero for false calls, but haven't really got much more than that. I feel that it's not impossible, but the other aspect of this exercise is then profiling it on said "unfriendly" processors to see if it's worth trying as hard as this to avoid branching.

+1  A: 

While I don't think this is a good idea, one solution would be to have pre-allocated buckets of various log2 sizes, stupid pseudocode:

class Allocator {

    void* malloc(size_t size) {
        int bucket = log2(size + sizeof(int));
        int* pointer = reinterpret_cast<int*>(m_buckets[bucket].back());
        m_buckets[bucket].pop_back();
        *pointer = bucket; //Store which bucket this was allocated from
        return pointer + 1; //Dont overwrite header
    }

    void free(void* pointer) {
        int* temp = reinterpret_cast<int*>(pointer) - 1;
        m_buckets[*temp].push_back(temp);
    }

    vector< vector<void*> > m_buckets;
};

(You would of course also replace the std::vector with a simple array + counter).

EDIT: In order to make this robust (i.e. handle the situation where the bucket is empty) you would have to add some form of branching.

EDIT2: Here's a small branchless log2 function:

//returns the smallest x such that value <= (1 << x)
int
log2(int value) {
    union Foo {
        int x;
        float y;
    } foo;
    foo.y = value - 1;
    return ((foo.x & (0xFF << 23)) >> 23) - 126; //Extract exponent (base 2) of floating point number
}

This gives the correct result for allocations < 33554432 bytes. If you need larger allocations you'll have to switch to doubles.

Here's a link to how floating point numbers are represented in memory.

Andreas Brinck
Log2 will probably need a platform dependant implementation to be branchless. On x86 you'll probably need something doing a BSR instruction on the arguments.
Jasper Bekkers
@Jasper: there's some code here that claims to be a branchless clz - I assume without testing that it works: http://stackoverflow.com/questions/2255177/finding-the-exponent-of-n-2x-using-bitwise-operations-logarithm-in-base-2-of/2255282#2255282. From a brief skim, it seems to return 0 for input 0, so you may want a branch to cover either the 0 case, or the greater-than-half-the-range case. As you say, though, implementations may provide access to faster CPU ops.
Steve Jessop
@Jasper @Steve See my edit.
Andreas Brinck
I guess. Going through float might be the best log2 on some platforms, but it's only pseudo-portable, so it can't be the most general fallback.
Steve Jessop
"If you need larger allocations you'll have to switch to doubles". Assuming you'd be allocating larger blocks out of power-of-two buckets in the first place. Hopefully "branchless" means "except for uncommon cases".
Steve Jessop
The `log2` method blew my mind, I am not sure I'll ever get those brain cells back...
Matthieu M.
@Matthieu It's just taking the exponent part of the floating point value by interpreting it as an int.
Jasper Bekkers
Well, I am not that well versed in the physical representation of the `float`... even if I kind of admire the trick, is it portable ? Anyhow I think it would at least benefit from a comment to indicate what's going on, I sure didn't realize we were extracting the exponent!
Matthieu M.
@Matthieu I think it's dependent on endianness, but this is easily handled. Apart from this if your CPU supports IEEE 754 floats, it should work fine. Added a link to the relevant wikipedia article on floats for those interested in the details of how floats are represented in memory.
Andreas Brinck