optimized memcpy

+6 A:

This answer for a very simiar question (about memset()) applies to here, too.

http://stackoverflow.com/questions/1134103/clearing-a-small-integer-array-memset-vs-for-loop/1134147#1134147

It basically says that compilers generate some very optimal code for memcpy()/memset() - and different code depending on the nature of the objects (size, alignment, etc).

And remember, only memcpy() PODs in C++.

Michael Burr 2009-07-30 21:37:36

+9 A:

Unlikely. Your compiler/standard library will likely have a very efficient and tailored implementation of memcpy. And memcpy is basically the lowest api there is for copying one part of memory to another.

If you want further speedups, find a way to not need any memory copying.

nos 2009-07-30 21:39:39

actually, there is at least one alternative that'll be faster in *some* cases at least, and should never be slower. See my answer. :)

jalf 2009-07-30 21:56:11

-1: it's well known that GCC builtin functions suck (see Agner Fog's benchmarks). Well, maybe it has been finally fixed, but it illustrates the point that library are *not* necessarily optimized.

Bastien Léonard 2009-07-30 22:58:22

Michael Burr 2009-07-30 23:36:37

@Michael: see the discussion that Agner created on GCC's mailing list: http://gcc.gnu.org/ml/gcc/2008-07/msg00410.html.

Bastien Léonard 2009-07-31 15:44:51

Thanks for the pointer - I wonder if Fog's testing of instrinsic memcpy/memset code generation was targeted/tuned to generic/i386 or was -march and/or -mtune used? There might be some experiments on my machine in the near future...

Michael Burr 2009-07-31 18:38:15

A:

Depending on what you're trying to do... if it's a big enough memcpy, and you are only be writing to the copy sparsely, a mmap with MMAP_PRIVATE to create a copy-on-write mapping could conceivably be faster.

smcameron 2009-07-30 21:43:11

However this requires writing it to a file in the first place...

bdonlan 2009-07-30 21:44:22

And the copy on write stuff will only work if the address space is in a different process (came back to say that.)Actually I don't think you have to write it to a file if you use MAP_ANONYMOUS flag.

smcameron 2009-07-30 21:51:55

no, memory mapping can be used between two memory locations as well

jalf 2009-07-30 21:52:07

Why would it be faster?

Bastien Léonard 2009-07-30 23:35:02

It hinges on the "depending what you're trying to do."If say, he's got 1Gb of memory that he's going to copy, and then maybe he's only going to modify a few kbytes of it, but doens't know which ahead of time, then doing the mmap involves only creating new virtual mapping to the same memory, which, in principle, could be faster than copying 1Gb. then if they're copy-on-write, only the pages touched by the few kbytes modifications would actually get copied by the virtual memory system. So, kind of a long shot that it would be faster, and depends on what he's doing.

smcameron 2009-08-03 15:12:55

creating such mmap will be fast, but it just will hide memcpy and do it a bit later, when mmaped memory will be writed. And this copying will be initiated as software interrupt, which is a very slow (comparing to memcpy)

osgx 2010-03-03 08:34:36

+1 A:

Depending on your platform there may be for specific use cases, like if you know the source and destination are aligned to a cache line and the size is an integer multiple of the cache line size. In general most compilers will produce fairly optimal code for memcpy though.

mattnewport 2009-07-30 21:43:38

+7 A:

First, a word of advice. Assume that the people who wrote your standard library are not stupid. If there was a faster way to implement a general memcpy, they'd have done it.

Second, yes, there are better alternatives.

In C++, use the std::copy function. It does the same thing, but it is 1) safer, and 2) faster in some cases. It is a template, meaning that it can and has been specialized for specific types, making it potentially faster than the general C memcpy.
Or, you can use your superior knowledge of your specific situation. The implementers of memcpy had to write it so it performed well in every case. If you have specific information about the situation where you need it, you might be able to write a faster version. For example, how much memory do you need to copy? How is it aligned? That might allow you to write a more efficient memcpy for this specific case. But it won't be as good in most other cases (if it'll work at all)

jalf 2009-07-30 21:51:38

Its unlikely that the compiler actually calls a memcpy function. I know that in gcc it doesnt, but actually replaces memcpy with a single instruction on i386.

Paul Biggar 2009-07-30 22:17:43

A:

I'm not sure that using the default memcpy is always the best option. Most memcpy implementations I've looked at tend to try and align the data at the start, and then do aligned copies. If the data is already aligned, or is quite small, then this is wasting time.

Sometimes it's beneficial to have specialized word copy, half word copy, byte copy memcpy's, as long as it doesn't have too negative an effect on the caches.

Also, you may want finer control over the actual allocation algorithm. In the games industry it's exceptionally common for people to write their own memory allocation routines, irrespective of how much effort was spent by the toolchain developers in the first place developing it. The games I've seen almost always tend to use Doug Lea's Malloc.

Generally speaking though, you'd be wasting time trying to optimize memcpy as there'll no doubt be lots of easier bits of code in your application to speed up.

DaveS 2009-07-30 22:07:10

A:

Optimization expert Agner Fog has published optimized memory functions: http://agner.org/optimize/#asmlib. It's under GPL though.

Some time ago Agner said that these functions should replace GCC builtins because they're a lot faster. I don't know if it's been done since then.

Bastien Léonard 2009-07-30 22:41:21

ansaurus

tags:

views:

answers:

related questions