Your inner function should copy count + 1
bytes, e.g.,
do /* copy one byte */ while(count-- != 0);
If the post-decrement is slow, other alternatives are:
... /* copy one byte */
while (count != 0) { /* copy one byte */; count -= 1; }
or
for (;;) { /* copy one byte */; if (count == 0) break; count -= 1; }
The caller/wrapper can do:
if (count > 0 && count <= 256) inner((uint8_t)(count-1))
or
if (((unsigned )(count - 1)) < 256u) inner((uint8_t)(count-1))
if its faster in your compiler.