Though I'm not 100% sure what you're asking for, here goes...
The issue that Duff's device addresses is one of loop unwinding (as you'll no doubt have seen on the Wiki link you posted). What this basically equates to is an optimisation of run-time efficiency, over memory footprint. Duff's device deals with serial copying, rather than just any old problem, but is a classic example of how optimisations can be made by reducing the number of times that a comparison needs to be done in a loop.
As an alternative example, which may make it easier to understand, imagine you have an array of items you wish to loop over, and add 1 to them each time... ordinarily, you might use a for loop, and loop around 100 times. This seems fairly logical and, it is... however, an optimisation can be made by unwinding the loop (obviously not too far... or you may as well just not use the loop).
So a regular for loop:
for(int i = 0; i < 100; i++)
{
myArray[i] += 1;
}
becomes
for(int i = 0; i < 100; i+10)
{
myArray[i] += 1;
myArray[i+1] += 1;
myArray[i+2] += 1;
myArray[i+3] += 1;
myArray[i+4] += 1;
myArray[i+5] += 1;
myArray[i+6] += 1;
myArray[i+7] += 1;
myArray[i+8] += 1;
myArray[i+9] += 1;
}
What Duff's device does is implement this idea, in C, but (as you saw on the Wiki) with serial copies. What you're seeing above, with the unwound example, is 10 comparisons compared to 100 in the original - this amounts to a minor, but possibly significant, optimisation.