There's at least an implicit branch to copy either forwards or backwards for memmove()
if the compiler is not able to deduce that an overlap is not possible. This means that without the ability to optimize in favor of memcpy()
, memmove()
is at least slower by one branch, and any additional space occupied by inlined instructions to handle each case (if inlining is possible).
Reading the eglibc-2.11.1
code for both memcpy()
and memmove()
confirms this as suspected. Furthermore, there's no possibility of page copying during backward copying, a significant speedup only available if there's no chance for overlapping.
In summary this means: If you can guarantee the regions are not overlapped, then selecting memcpy()
over memmove()
avoids a branch. If the source and destination contain corresponding page aligned and page sized regions, and don't overlap, some architectures can employ hardware accelerated copies for those regions, regardless of whether you called memmove()
or memcpy()
.
Update0
There is actually one more difference beyond the assumptions and observations I've listed above. As of C99, the following prototypes exist for the 2 functions:
void *memcpy(void * restrict s1, const void * restrict s2, size_t n);
void *memmove(void * s1, const void * s2, size_t n);
Due to the ability to assume the 2 pointers s1
and s2
do not point at overlapping memory, straightforward C implementations of memcpy
are able to leverage this to generate more efficient code without resorting to assembler, see here for more. I'm sure that memmove
can do this, however additional checks would be required above those I saw present in eglibc
, meaning the performance cost may be slightly more than a single branch for C implementations of these functions.