tags:

views:

218

answers:

7

This question has been bothering me for some time. The possibilities I am considering are

  1. memcpy
  2. std::copy
  3. cblas_dcopy

Does anyone have any clue on what the pros and cons are with these three? Other suggestions are also welcome.

A: 

In most cases memcpy will be the fastest, as it is the lowest level and may be implemented in machine code on a given platform. (however, if your array contains non-trivial objects memcpy may not do the correct think, so it may be safer to stick with std::copy)

However it all depends on how well the stdlib is implanted on the given platform etc. As the standard does not say how fast operations must be, there is no way to know in a “portable” since what will be fastest.

Profiling your application will show the fasted on a given platform, but will only tell you about the test platform.

However, when you profile you application you will most likely find that the issues are in your design rather than your choose of array copy method. (E.g. why do you need to copy large arrays so match?)

Ian Ringrose
A: 

Just Profile your application. You will likely find that copying is not the slowest part of it.

BatchyX
A: 

memcpy, however, if your array contains non-trivial objects, stick with std::copy.

Viktor Sehr
A good implementation of `std::copy` could be faster even for basic objects; `memcpy` has to deal with arbitrary address alignments, but `std::copy` knows the alignment at compile time.
Mike Seymour
An awful lot of C++ performance tips seem to include qualifiers like "a good implementation ... could be faster". How many of these hypothetical optimisations have actually been implemented, anywhere, ever?
Porculus
@Mike Seymour; You realize that were talking about copying an array, i.e. a continuos block of memory, containing objects?
Viktor Sehr
@Porculus: Tony has posted an example. Still, for each optimization, the same could be said again: with more optimizations, this could be faster.
peterchen
@Viktor: Yes. On my version of GCC, `std::copy` on an array of POD data generates a call to `memmove`. Any decent compiler will do likewise (or, for bonus points, call a byte-copy function specialised for the datatype's alignment), so there's no reason to sacrifice type safety by calling `memcpy` in the belief that it might be faster.
Mike Seymour
@Viktor: But an array of what? `memcpy` takes void pointers, so it can make no assumptions about alignment. `std::copy` knows the type it is working on, so it knows how much alignment it can rely on.
jalf
@Mike Seymour: I agree that std::copy is the #1 choice, and I think I understand that std::copy could be faster if they objects were copied one-by-one because of alignment. But for a *continues block* of memory containing several objects I dont see how std::copy could ever benefit?
Viktor Sehr
@Viktor: `memcpy` has to handle arbitrary memory alignment; it will have to do a runtime check before running the byte-copy loop optimised for the object's alignment. `std::copy` could potentially skip that step, since the alignment is known at compile time, if the implementor felt that optimisation was worthwhile. Anyway, the point I was trying to make is that `std::copy` will be *no slower* (and, just possibly, slightly faster), as well as giving correct behaviour for any object type.
Mike Seymour
@Mike Seymour: Your point is that it will be *no slower* by referring to your compiler utilizing memmove, which *is slower* than memcpy (if only marginally)? I dont understand what you mean by utilizing alignment, what exactly do you mean it would do to optimize copying a large block of memory?
Viktor Sehr
@Viktor: OK, *not significantly slower* (and potentially insignificantly faster); sorry for not being quite as precise as I could be.
Mike Seymour
@Viktor: As for using alignment: if the addresses are known to be well enough aligned, then the compiler should use an optimised implementation to copy the data in words or larger chunks. If the addresses aren't aligned, then (on most platforms) this won't be possible, and it will have to fall back to a slower implementation. If the addresses are known to be aligned at compile time, you can jump straight to the optimised version; otherwise, you need to check the alignment at runtime, incurring a small cost.
Mike Seymour
+11  A: 

In C++ you should use std::copy by default unless you have good reasons to do otherwise. The reason is that C++ classes define their own copy semantics via the copy constructor and copy assignment operator, and of the operations listed, only std::copy respects those conventions.

memcpy() uses raw, byte-wise copy of data (though likely heavily optimized for cache line size, etc.), and ignores C++ copy semantics (it's a C function, after all...).

cblas_dcopy() is a specialized function for use in linear algebra routines using double precision floating point values. It likely excels at that, but shouldn't be considered general purpose.

If your data is "simple" POD type struct data or raw fundamental type data, memcpy will likely be as fast as you can get. Just as likely, std::copy will be optimized to use memcpy in these situations, so you'll never know the difference.

In short, use std::copy().

Drew Hall
It seems that `std::copy` rather uses `std::memmove` because the ranges are allowed to overlap (at one end).
visitor
@visitor: Probably true. But I bet memmove() calls memcpy() if it determines the ranges do not overlap (easy pointer arithmetic).
Drew Hall
I have seen a memmove implementation that just do the copy backwards if the overlap would cause problems going forward.
doron
In addition, `std::copy` can (at least in theory) take advantage of platform-specific optimizations and/or for specific types. --- @deus-ex-machina399: that's the typical solution, but copying backwards is not cache-optimal.
peterchen
A: 

I have to think that the others will call memcpy(). Having said that I can't beleive that there will be any appreciable difference.

If it really matters to you, code all three and run a profiler, but it might be better to consider things like readability/maintainability, exception-safe, etc... (and code an assembler insert while you are at it, not that you are likely to see a difference)

Is your program threaded?

And, most importantly, how are you declating your array? (what is it an array of) and how large is it?

LeonixSolutions
A: 

memcpy is probably the fastest way to copy a contiguous block of memory. This is because it will likely be highly optimized to your particular bit of hardware. It is often implemented as a built-in compiler function.

Having said that, and non POD C++ object is unlikely to be contiguous and therefore copying arrays of C++ objects using memcpy is likely to give you unexpected results. When copying arrays (or collections) of C++ objects, std::copy will use the object's own copy semantics and is therefore suitable for use with non POD C++ objects.

cblas_dcopy looks like a copy for use with a specific library and probably has little use when not using that library.

doron
why do you assume `std::copy` to be slower than `memcpy`?
jalf
+1  A: 

Use std::copy unless profiling shows you a needed benefit in doing otherwise. It honours the C++ object encapsulation, invoking copy constructors and assignment operators, and the implementation could include other inline optimisations such as avoiding an out-of-line function call to memcpy() if the size is known at compile time and is too small to justify the function call overhead. (Some systems may have memcpy macros that make similar determinations, but in general the C++ compiler will have more insight into what optimisations are functionally equivalent.)

FWIW / on the old Linux box I have handy, GCC doesn't do any spectacular optimisations, but bits/type_traits.h does allow the program to easily specify whether std::copy should fall through to memcpy():

 * Copyright (c) 1997
 * Silicon Graphics Computer Systems, Inc.
 *
 * Permission to use, copy, modify, distribute and sell this software
 * and its documentation for any purpose is hereby granted without fee,
 * provided that the above copyright notice appear in all copies and            
 * that both that copyright notice and this permission notice appear            
 * in supporting documentation.  Silicon Graphics makes no                      
 * representations about the suitability of this software for any               
 * purpose.  It is provided "as is" without express or implied warranty.        
 ...                                                                            

/*                                                                              
This header file provides a framework for allowing compile time dispatch        
based on type attributes. This is useful when writing template code.            
For example, when making a copy of an array of an unknown type, it helps        
to know if the type has a trivial copy constructor or not, to help decide       
if a memcpy can be used.

The class template __type_traits provides a series of typedefs each of
which is either __true_type or __false_type. The argument to
__type_traits can be any type. The typedefs within this template will
attain their correct values by one of these means:
    1. The general instantiation contain conservative values which work
       for all types.
    2. Specializations may be declared to make distinctions between types.
    3. Some compilers (such as the Silicon Graphics N32 and N64 compilers)
       will automatically provide the appropriate specializations for all
       types.

EXAMPLE:

//Copy an array of elements which have non-trivial copy constructors
template <class _Tp> void
  copy(_Tp* __source,_Tp* __destination,int __n,__false_type);
//Copy an array of elements which have trivial copy constructors. Use memcpy.
template <class _Tp> void
  copy(_Tp* __source,_Tp* __destination,int __n,__true_type);

//Copy an array of any type by using the most efficient copy mechanism
template <class _Tp> inline void copy(_Tp* __source,_Tp* __destination,int __n) {
   copy(__source,__destination,__n,
        typename __type_traits<_Tp>::has_trivial_copy_constructor());
}
*/
Tony