views:

311

answers:

4

Many years ago while working on a tight graphics I/O problem, Tom Duff unrolled a loop and created his Duff's Device as follows:

dsend(to, from, count)
char *to, *from;
int count;
{
    int n = (count + 7) / 8;
    switch (count % 8) {
    case 0: do { *to = *from++;
    case 7:      *to = *from++;
    case 6:      *to = *from++;
    case 5:      *to = *from++;
    case 4:      *to = *from++;
    case 3:      *to = *from++;
    case 2:      *to = *from++;
    case 1:      *to = *from++;
           } while (--n > 0);
    }
}

(Note this uses old style function parameters - that's not an error.)

This coding comes directly out of thinking in assembler and coding in C and is dependent on C's case statement fall-through. Can this kind of creativity in interlacing control structures work in any other languages?

+2  A: 

It works in C++.

Note though the code generated depends on your compiler. In particular, when I compiled Duff's device using GCC targeting ARM architectures, the resulting ARM assembler was sub-optimal (I think GCC turned it into a jump table) at any optimization level.

I ended up just handing coding the assembler.

tpdi
Yes, it was good when compilers didn't optimise much (which is when Duff came up with it). The problem is that at every step there's both the drop-through flow and the 'case' label, and it takes a very good compiler to work out that it doesn't need to flush registers etc. And a compiler that good will probably be able to unroll the loop of the naive implementation anyway.Still, makes a good interview question :)
Paul
+2  A: 

You can do it in any language that supports computed GOTO statements (Fortran, some BASICs, etc.)

Dave
+1  A: 

Duff's device is essentially a computed goto which can be done in many other languages - assembly (of course) and FORTRAN being a couple that support them directly.

Michael Burr
+1  A: 

I used it very successfully in javascript to speed up large array processing. I wish I could use it in c#.

Gary
You can. There's no need for the cleanup case to be the same as the loop. Do the unrolled loop n/8 times, then 7 conditional steps at the end. Given it's for large arrays, any minor inefficiency in the last 7 is lost in the noise.
Paul