It's called Duff's device and you can read about it on wikipedia.
It takes care of one problem with an unrolled loop: there could be a non-integer number of passes needed. One method is to deal with this outside the main loop, but it's more efficient to use Duff's device which uses a very fast jump table and avoids extra looping overhead dealing with the odd number of operations.
In your example, which is a memory copy, please compare to the naive version:
void memcpy(char* dst, char* src, size_t count)
{
begin:
if (count-- == 0) return;
*(dst++) = *(src++);
goto begin;
}
To copy 15 bytes, this does the following:
test count,
copy,
loop,
test count,
copy,
loop,
test count,
copy,
loop,
test count,
copy,
loop,
test count,
copy,
loop,
test count,
copy,
loop,
test count,
copy,
loop,
test count,
copy,
loop,
test count,
copy,
loop,
test count,
copy,
loop,
test count,
copy,
loop,
test count,
copy,
loop,
test count,
copy,
loop,
test count,
copy,
loop,
test count,
copy,
loop,
test count
Note how many times the "test count" and "loop" operations must be done.
Using duff's version which you showed, it is much simpler:
jump based on count,
copy,
copy,
copy,
copy,
copy,
copy,
copy,
test count,
loop,
copy,
copy,
copy,
copy,
copy,
copy,
copy,
copy,
test count
which saves over half the steps