Reading your comments, it sounds like you might want to use parallelism. There are instructions to do this, but they only operate on registers, not memory.
This is because of the way the computer architecture is (I'm assuming x86).
You can only be accessing one memory location at a time because the computer only has one address bus. If you tried to access more than one location at a time, you would be overloading the bus and nothing would work properly.
If you can put the data you need in registers, then you can use a lot of cool processor instructions, such as MMX or SSE, to perform parallel calculations. But as for copying memory in parallel, it's not possible.
As others have said, use memcpy. It's reliable, debugged, and fast.