Hi,
I'm writing a small ray tracer using bounding volume hierarchies to accelerate ray tracing. Long story short, I have a binary tree and I might need to visit multiple leafs.
Current I have a node with two children left and right, then during travel() if some condition, in this example intersect(), the children are visited:
class BoundingBoxNode{
BoundingBoxNode* left, *right;
void travel(param &p);
inline bool intersect(param &p){...};
};
void BoundingBoxNode::travel(param &p){
if(this->intersect(p)){
if(left)
left->travel(p);
if(right)
right->travel(p);
}
}
This approach uses recursive methods calls, however, I need to optimized this code as much as possible... And according to Optimization Reference Manual for IA-32, function calls deeper than 16 can be very expensive, so I would like to do this using a while loop instead of recursive calls. But I do NOT wish to do dynamic heap allocations as these are expensive. So I was thinking that maybe I could use that fact that every time a while loop starts over the stack will be in the same position. In the following very ugly hack I rely on alloca() to always allocate the same address:
class BoundingBoxNode{
BoundingBoxNode* left, right;
inline void travel(param &p){
int stack_size = 0;
BoundingBoxNode* current = this;
while(stack_size >= 0){
BoundingBoxNode* stack = alloca(stack_size * 4 + 2*4);
if(current->intersect(p)){
if(current->left){
stack[stack_size] = current->left;
stack_size++;
}
if(current->right){
stack[stack_size] = current->right;
stack_size++;
}
}
stack_size--;
current = stack[stack_size];
}
};
inline bool intersect(param &p){...};
};
However surprising it may seem this approach does fail :) But it does work as long as the stack is smaller than 4 or 5... I'm also quite confident that this approach is possible, I just really think I need some help implementing it correctly.
So how can I manipulate the stack manually from C++, is it possible that I can use some compiler extension... Or must I do this is assembler, and if so, how do I write assembler than can be compiled with both GCC and ICC.
I hope somebody can help me... I don't need a perfect solution, just a hack, if it works it's good enough for this purpose :)
Regards Jonas Finnemann Jensen