Are there faster alternatives to memcpy() in C++?
This answer for a very simiar question (about memset()
) applies to here, too.
It basically says that compilers generate some very optimal code for memcpy()
/memset()
- and different code depending on the nature of the objects (size, alignment, etc).
And remember, only memcpy()
PODs in C++.
Unlikely. Your compiler/standard library will likely have a very efficient and tailored implementation of memcpy. And memcpy is basically the lowest api there is for copying one part of memory to another.
If you want further speedups, find a way to not need any memory copying.
Depending on what you're trying to do... if it's a big enough memcpy, and you are only be writing to the copy sparsely, a mmap with MMAP_PRIVATE to create a copy-on-write mapping could conceivably be faster.
Depending on your platform there may be for specific use cases, like if you know the source and destination are aligned to a cache line and the size is an integer multiple of the cache line size. In general most compilers will produce fairly optimal code for memcpy though.
First, a word of advice. Assume that the people who wrote your standard library are not stupid. If there was a faster way to implement a general memcpy, they'd have done it.
Second, yes, there are better alternatives.
- In C++, use the
std::copy
function. It does the same thing, but it is 1) safer, and 2) faster in some cases. It is a template, meaning that it can and has been specialized for specific types, making it potentially faster than the general C memcpy. - Or, you can use your superior knowledge of your specific situation. The implementers of memcpy had to write it so it performed well in every case. If you have specific information about the situation where you need it, you might be able to write a faster version. For example, how much memory do you need to copy? How is it aligned? That might allow you to write a more efficient memcpy for this specific case. But it won't be as good in most other cases (if it'll work at all)
I'm not sure that using the default memcpy is always the best option. Most memcpy implementations I've looked at tend to try and align the data at the start, and then do aligned copies. If the data is already aligned, or is quite small, then this is wasting time.
Sometimes it's beneficial to have specialized word copy, half word copy, byte copy memcpy's, as long as it doesn't have too negative an effect on the caches.
Also, you may want finer control over the actual allocation algorithm. In the games industry it's exceptionally common for people to write their own memory allocation routines, irrespective of how much effort was spent by the toolchain developers in the first place developing it. The games I've seen almost always tend to use Doug Lea's Malloc.
Generally speaking though, you'd be wasting time trying to optimize memcpy as there'll no doubt be lots of easier bits of code in your application to speed up.
Optimization expert Agner Fog has published optimized memory functions: http://agner.org/optimize/#asmlib. It's under GPL though.
Some time ago Agner said that these functions should replace GCC builtins because they're a lot faster. I don't know if it's been done since then.