Which one to use - memmove() or memcpy() - when buffers don't overlap?

views:

638

answers:

+6 Q:

Which one to use - memmove() or memcpy() - when buffers don't overlap?

Using memcpy() when source and destination overlap can lead to undefined behaviour - in those cases only memmove() can be used.

But what if I know for sure buffers don't overlap - is there a reason to use specifically memcpy() or specifically memmove()? Which should I use and why?

+20 A:

memcpy() doesn't have any special handling for overlapping buffers so it lacks some checks therefore it is faster than memmove().

Also on some architectures memcpy() can benefit from using CPU instructions for moving blocks of memory - something that memmove() cannot use.

qrdl 2009-12-25 11:14:10

your edit captured what I wanted to say, +1.

roe 2009-12-25 11:21:08

+1, for mentioning CPU instructions.

MAK 2009-12-25 11:33:15

Even on a RISC architecture, there are often block-move operations from which memcpy() can benefit. PowerPC has VMX, for example.

Crashworks 2009-12-25 12:56:34

Nah, decent code generators produce rep movs after checking for no overlap. MSVC does.

Hans Passant 2009-12-25 13:03:27

@nobugz Not always at compile time you can determine whether buffers overlap or not. Or did you mean checking at run time?

qrdl 2009-12-25 13:12:25

@Crashworks Interesting, didn't know that. Seems I had experience with more RISCy RISCs, that have just load/store instructions to access memory.

qrdl 2009-12-25 13:23:16

@qrdl: RISC doesn't have rep movs, but many RISC architectures have vector registers that are wider than the scalar core registers, and have a correspondingly wider path to/from memory.

Stephen Canon 2009-12-25 14:35:11

Obviously I was wrong about RISC architectures - I removed that bit

qrdl 2009-12-25 14:48:39

A few of the newer optimizations for memcpy() right now on modern CPU's are cache-based. Either using a temporary cache area for reads, cache prefetching from source, cache-zeroing on destination (dcbz on PPC), etc. Some CPU's also have "DMA-like" extensions for fully asynchronous copying. A good implementation of memmove will use the same optimized code as memcpy but it does indeed require a check to see if they overlap first so it will be slightly slower if you already know no overlap exists.

Adisak 2009-12-26 00:40:00

+2 A:

If you're interested in which will perform better, you need to test it on the target platform. Nothing in the standard mandates how the functions are implemented and, while it may seem logical that a non-checking memcpy would be faster, this is by no means a certainty.

It's quite possible, though unlikely, that the person who wrote memmove for your particular compiler was a certified genius while the poor soul who got the job of writing memcpy was the village idiot :-)

Although, in reality, I find it hard to imagine the memmove could be faster than memcpy, I don't discount the possibility. Measure, don't guess.

paxdiablo 2009-12-25 12:47:54

`memcpy` has the restrict qualifier on its arguments, not `memmove`. (It codifies precisely the fact that the buffers don't overlap).

Stephen Canon 2009-12-25 14:39:33

D'Oh! You're right, of course, @StephenC, I got them the wrong way around. Removed that twaddle from my answer :-)

paxdiablo 2009-12-26 00:36:38

+8 A:

Assuming a sane library implementor, memcpy will always be at least as fast as memmove. However, on most platforms the difference will be minimal, and on many platforms memcpy is just an alias for memmove to support legacy code that (incorrectly) calls memcpy on overlapping buffers.

Both memcpy and memmove should be written to take advantage of the fastest loads and stores available on the platform.

To answer your question: you should use the one that is semantically correct. If you can guarantee that the buffers do not overlap, you should use memcpy. If you cannot guarantee that the buffers don't overlap, you should use memmove.

Stephen Canon 2009-12-25 14:30:10

+1. I especially like the "assuming sane" counterpoint to my own answer :-)

paxdiablo 2009-12-26 00:37:35

Nitpick: `memcpy` and `memmove` should be written to take advantage of the fastest `unaligned` loads and stores available on the platform. If you know your buffers are aligned properly, you can often get much better performance using things like MMX, which copy much larger data units at a time.

Adam Rosenfield 2009-12-26 01:40:13

@Adam: Generally speaking one can arrange to use aligned loads and stores in memcopy by first copying some smaller units to achieve appropriate alignment. If the buffers do not have similar alignment, it will be necessary to apply some shift or permute before storing, but this is faster than using unaligned memory accesses on many architectures.

Stephen Canon 2009-12-26 02:23:15

ansaurus

tags:

views:

answers:

Which one to use - memmove() or memcpy() - when buffers don't overlap?

related questions