Using the stock Sun 1.6 compiler and JRE/JIT, is it a good idea to use the sort of extensive unroll exemplified by Duff's Device to unroll a loop? Or does it end up as code obfuscation with no performance benefit?
The Java profiling tools I've used are less informative about line-by-line CPU usage than, say, valgrind, so I was looking to augment measurement with other people's experience.
Note that, of course, you can't exactly code Duff's Device, but you can do the basic unroll, and that's what I'm wondering about.
short stateType = data.getShort(ptr);
switch (stateType) {
case SEARCH_TYPE_DISPATCH + 16:
if (c > data.getChar(ptr + (3 << 16) - 4)) {
ptr += 3 << 16;
}
case SEARCH_TYPE_DISPATCH + 15:
if (c > data.getChar(ptr + (3 << 15) - 4)) {
ptr += 3 << 15;
}
...
down through many other values.