ansaurus

Question

What are real significant cases when memcpy() is faster than memmove()?

Answer 1

+2 A:

Well, memmove has to copy backwards when the source and destination overlap, and the source is before the destination. So, some implementations of memmove simply copy backwards when the source is before the destination, without regard for whether the two regions overlap.

A quality implementation of memmove can detect whether the regions overlap, and do a forward-copy when they don't. In such a case, the only extra overhead compared to memcpy is simply the overlap checks.

Chris Jester-Young 2010-09-13 13:51:11

If a forward-copy faster than a backward-copy?

Robert 2010-09-13 14:01:17

In some architectures, that is certainly possible.

Chris Jester-Young 2010-09-13 14:03:56

@Chris: To name one (no longer of particular interest) the Z80 had the LDIR instruction that copied forward, and nothing comparable to copy backwards.

David Thornley 2010-09-13 14:14:57

@David: I could have sworn the Z80 had an LDDR as well...

Jerry Coffin 2010-09-13 15:25:19

@Jerry: Drat, you're right. I should learn not to make firm statements about a processor I haven't programmed on in twenty-five years.

David Thornley 2010-09-13 16:52:22

Most modern x86 CPU's will do a read-ahead: reading x and x+1 will implicitly hint the CPU to get x+2 before you actually try.

MSalters 2010-09-14 09:41:57

Answer 2

+1 A:

Simplistically, memmove needs to test for overlap and then do the appropriate thing; with memcpy, one asserts that there is not overlap so no need for additional tests.

Having said that, I have seen platforms that have exactly the same code for memcpy and memmove.

doron 2010-09-13 13:52:57

And I hope that those platforms exhibit the memmove() behaviour for both!

Francesco 2010-09-13 14:02:21

Answer 3

+2 A:

At best, calling memcpy rather than memmove will save a pointer comparison and a conditional branch. For a large copy, this is completely insignificant. If you are doing many small copies, then it might be worth measuring the difference; that is the only way you can tell whether it's significant or not.

It is definitely a microoptimisation, but that doesn't mean you shouldn't use memcpy when you can easily prove that it is safe. Premature pessimisation is the root of much evil.

Mike Seymour 2010-09-13 14:07:08

Answer 4

+12 A:

There's at least an implicit branch to copy either forwards or backwards for memmove() if the compiler is not able to deduce that an overlap is not possible. This means that without the ability to optimize in favor of memcpy(), memmove() is at least slower by one branch, and any additional space occupied by inlined instructions to handle each case (if inlining is possible).

Reading the eglibc-2.11.1 code for both memcpy() and memmove() confirms this as suspected. Furthermore, there's no possibility of page copying during backward copying, a significant speedup only available if there's no chance for overlapping.

In summary this means: If you can guarantee the regions are not overlapped, then selecting memcpy() over memmove() avoids a branch. If the source and destination contain corresponding page aligned and page sized regions, and don't overlap, some architectures can employ hardware accelerated copies for those regions, regardless of whether you called memmove() or memcpy().

Update0

There is actually one more difference beyond the assumptions and observations I've listed above. As of C99, the following prototypes exist for the 2 functions:

void *memcpy(void * restrict s1, const void * restrict s2, size_t n);
void *memmove(void * s1, const void * s2, size_t n);

Due to the ability to assume the 2 pointers s1 and s2 do not point at overlapping memory, straightforward C implementations of memcpy are able to leverage this to generate more efficient code without resorting to assembler, see here for more. I'm sure that memmove can do this, however additional checks would be required above those I saw present in eglibc, meaning the performance cost may be slightly more than a single branch for C implementations of these functions.

Matt Joiner 2010-09-13 14:10:56

This. Of course, for large copies, the difference is completely insignificant, but for small copies, the extra check can account for a substantial portion of the time spent in the routine.

Stephen Canon 2010-09-13 15:22:00

The precisely right @Stephen Canon. For anything larger than a given size, the different is next to none. For your simple zeroing of stack variables, it's probably noticeable in tight loops.

Matt Joiner 2010-09-14 02:42:05

Answer 5

+2 A:

It's certainly possible that memcpy is merely a call to memmove, in which case there'd be no benefit to using memcpy. On the other extreme, it's possible that an implementor assumed memmove would rarely be used, and implemented it with the simplest possible byte-at-a-time loops in C, in which case it could be ten times slower than an optimized memcpy. As others have said, the likeliest case is that memmove uses memcpy when it detects that a forward copy is possible, but some implementations may simply compare the source and destination addresses without looking for overlap.

With that said, I would recommend never using memmove unless you're shifting data within a single buffer. It might not be slower, but then again, it might be, so why risk it when you know there's no need for memmove?

R.. 2010-09-13 18:01:32

Answer 6

A:

Just simplify and always use memmove. A function that's right all the time is better than a function that's only right half the time.

2010-09-13 18:12:08

It's a fair call.

Matt Joiner 2010-09-20 14:57:15

ansaurus

tags:

views:

answers:

What are real significant cases when memcpy() is faster than memmove()?

Update0

related questions