Hello, I have a performance issue in a bottleneck section in my code. Basically it's a simple nested loop.
Profiling the issue reveals that the program spends a lot of time just incrementing both of the loop counters (++) and testing for termination (i/j < 8).
Watching the assembly output I see that both counters don't get registers and accessing them costs a lot of cycles. Using the "register" keyword doesn't convince the compiler to actually put them in registers. is there something which can be done to to optimize the counters access time?
Here's the assembly output. The C source is just a simple nested loop with i/j counters.
2738 0.2479 2459 0.1707 : 1e6c: jne 1dd1 <process_hooks+0x121>
1041 0.0942 1120 0.0778 : 1e72: addl $0x1,0xffffffd4(%ebp)
2130 0.1928 2102 0.1459 : 1e76: cmpl $0x8,0xffffffd4(%ebp)
2654 0.2403 2337 0.1622 : 1e7a: jne 1da0 <process_hooks+0xf0>
809 0.0732 814 0.0565 : 1e80: jmp 1ce2 <process_hooks+0x32>
As requested, here's the C code as well. Compiler is gcc btw:
for (byte_index=0; byte_index < MASK_SIZE / NBBY; byte_index++)
{
if (check_byte(mask,byte_index))
{
for (bit_index=0; bit_index < NBBY; bit_index++)
{
condition_index = byte_index*NBBY + bit_index;
if (check_bit(condition_mask,condition_index))
{
.
.
.
}
}
}
}
Thanks