I have a function that is doing memcpy, but it's taking up an enormous amount of cycles. Is there a faster alternative/approach than using memcpy to move a piece of memory?
Usually the standard library shipped with the compiler will implement memcpy()
the fastest way possible for the target platform already.
memcpy
is likely to be the fastest way you can copy bytes around in memory. If you need something faster - try figuring out a way of not copying things around, e.g. swap pointers only, not the data itself.
It's generally faster not to make a copy at all. Whether you can adapt your function to not copy I don't know but it's worth looking in to.
Sometimes functions like memcpy, memset, ... are implemented in two different ways:
- once as a real function
- once as some assembly that's immediately inlined
Not all compilers take the inlined-assembly version by default, your compiler may use the function variant by default, causing some overhead because of the function call. Sheck your compiler to see how to take the intrinsic variant of the function (command line option, pragma's, ...).
Edit: See http://msdn.microsoft.com/en-us/library/tzkfha43%28VS.80%29.aspx for an explanation of intrinsics on the Microsoft C compiler.
I assume you must have huge areas of memory that you want to copy around, if the performance of memcpy has become an issue for you?
In this case, I'd agree with nos's suggestion to figure out some way NOT to copy stuff..
Instead of having one huge blob of memory to be copied around whenever you need to change it, you should probably try some alternative data structures instead.
Without really knowing anything about your problem area, I would suggest taking a good look at persistent data structures and either implementing one of your own or reusing an existing implementation.
Check you Compiler/Platform manual. For some micro-processors and DSP-kits using memcpy is much slower than intrinsic functions or DMA operations.
If your platform supports it, look into if you can use the mmap() system call to leave your data in the file... generally the OS can manage that better. And, as everyone has been saying, avoid copying if at all possible; pointers are your friend in cases like this.
Please offer us more details. On i386 architecture it is very possible that memcpy is the fastest way of copying. But on different architecture for which the compiler doesn't have an optimized version it is best that you rewrite your memcpy function. I did this on a custom ARM architecture using assembly language. If you transfer BIG chunks of memory then DMA is probably the answer you are looking for.
Please offer more details - architecture, operating system (if relevant).
You may want to have a look at this:
http://www.danielvik.com/2010/02/fast-memcpy-in-c.html
Another idea I would try is to use COW techniques to duplicate the memory block and let the OS handle the copying on demand as soon as the page is written to. There are some hints here using mmap()
: http://stackoverflow.com/questions/1565177/can-i-do-a-copy-on-write-memcpy-in-linux
nos is right, you're calling it too much.
To see where you are calling it from and why, just pause it a few times under the debugger and look at the stack.
Agner Fog has fast memcpy implementation http://www.agner.org/optimize/#asmlib
memory to memory is usually supported in CPU's command set, and memcpy will usually use that. And this is usually the fastest way.
You should check what exactly your CPU is doing. On Linux, watch for swapi in and out and virtual memory effectiveness with sar -B 1 or vmstat 1 or by looking in /proc/memstat. You may see that your copy has to push out a lot of pages to free space, or read them in, etc.
That would mean your problem isn't in what you use for the copy, but how your system uses memory. You may need to decrease file cache or start writing out earlier, or lock the pages in memory, etc.