ansaurus

Question

Answer 1

A:

Does it really copy a byte at a time on modern processors?
I would have thought there was a block memory operation in the memory controller.

Martin Beckett 2010-08-26 05:48:23

well it was a question not an answer - but still surprised

Martin Beckett 2010-08-26 16:36:24

Answer 2

A:

Depends on the compiler. Good compilers will use good optimizations dependent on the target processor instruction set and bus width.

JackN 2010-08-26 05:52:15

i believe this should go into a comment instead of separate reply

YeenFei 2010-08-26 06:34:36

Answer 3

+4 A:

memmove can be turned into a memcpy if the two memory regions don't overlap. Obviously memcpy is extremely optimised on most systems (one of the ones I use makes use of almost every trick in the book from unrolled loops to SSE operations where supported for maximum throughput).

If the two memory regions do overlap, for all intents and purposes the region to be copied is moved into a temporary buffer and the temporary buffer is copied (all with memcpy, most likely) back on top of the original buffer. You can't work from the start or work from the back with an overlapping region, because you'll always end up with at least some data being corrupted in the process.

That being said, it's been a long time since I've looked at libc code, so there may be an optimisation for memmove and overlapping regions that I haven't thought of yet.

memmove doesn't depend on the way the stack grows at all - it merely copies one region of memory to another location - exactly like memcpy, except that it handles overlapping regions and memcpy doesn't.

EDIT: Actually, thinking about it some more... Working from the back can work if you go from the right "source" (so to speak), depending on the move itself (eg, is source < dest or not?). You can read newlib's implementation here, and tt's fairly well-commented too.

Matthew Iselin 2010-08-26 05:53:57

I'd like to see you copy into a temporary buffer when `n` is `SIZE_MAX>>1`...

R.. 2010-09-04 01:09:13

Answer 4

+5 A:

Mathematically, you don't have to worry about whether they overlap at all. If src is less than dst, just copy from the end. If src is greater than dst, just copy from the beginning.

If src and dst are equal, just exit straight away.

That's because your cases are one of:

1 <-----s-----> <-----d----->  start at end of s
2 <-----s--<==>--d----->       start at end of s
3 <-----sd----->               do nothing
4 <-----d--<==>--s----->       start at beginning of s
5 <-----d-----> <-----s----->  start at beginning of s

Even if there's no overlap, that will still work fine, and simplify your conditions.

If you have a more efficient way to copy forwards than backwards then, yes, you should check for overlap to ensure you're using the more efficient method if possible. In other words, change option 1 above to copy from the beginning.

However, in that case, you may also find that it's faster to copy twice, once from s to some temporary memory then from that temporary memory to d but that depends entirely on your respective algorithms.

paxdiablo 2010-08-26 06:25:10

Note that there is another hidden assumption here, which is that you're only copying a byte at a time.

caf 2010-08-27 03:18:21

Well, whether you copy a byte at a time, or a quadword or some massive SSE9 1024-bit hyperword value, the theory remains the same. You have to make sure you don't copy _into_ an overlap area that you haven't copied _out of_ yet. All the N-is-wider-than-char options introduce is a somewhat more complex detection of overlap (and final transfer) in the case where it's not a direct multiple of the value of N.

paxdiablo 2010-08-27 05:22:23

@caf: If `src` and `dest` have the same alignment with respect to a larger type you could copy though, you never have to worry about clobbering an area you haven't yet copied, since the positions will always differ by at least that size. If they don't share the same alignment, you're stuck copying as bytes anyway...unless you want to make use of some nasty x86 unaligned io...

R.. 2010-09-04 01:11:51

ansaurus

tags:

views:

answers:

memmove implementation in C

related questions