As quoted by Johannes from that splendid Bit Twiddling Hacks page, there's an excellent and detailed description of that algorithm in Software Optimization Guide for AMD Athlon™ 64 and Opteron™ Processors from AMD on page numbers 179 and 180 - corresponding to pages 195 and 196 of the PDF.
Also describing the same idea and some alternative solutions and their relative performance: this page.
fvu
2010-01-05 23:02:28