views:

1166

answers:

15
double *d;
int length=10;

memset(d, length, 0);

//or

for (int i=length; i; i--)
  d[i]=0.0;
+3  A: 

memset(d, 10, 0) is wrong as it only nulls 10 bytes. prefer std::fill as the intent is clearest.

stefaanv
No, it doesn't null any bytes. It tens 0 bytes.
Michael Krelin - hacker
another reason to use std::fill
stefaanv
I do not hold anything against `std::fill`, but how is it *another reason*?
Michael Krelin - hacker
With memset, I always forget the order of the parameters, which I took over from the question.
stefaanv
I'd go as far as guessing that you forget the order of the parameters because you don't use it often enough ;-) And now go ahead and tell me you're not using it because you keep forgetting the order of parameters ;-)
Michael Krelin - hacker
I'm now considering, on principle, never again using any function that takes two integer parameters. Possible exception if the order doesn't matter, so std::min is OK ;-)
Steve Jessop
I'm not using it anymore because it is too lowlevel to do any real initialization and it is unsafe to initialize classes with it.The only use for it is to initialize a real byte-buffer and you can do that too with std::fill
stefaanv
@stefaanv: since 99% of the time I call memset() to zero a buffer, I have a nice little inline: inline void zeromem( void* p, size_t siz) { memset( p, 0, siz); } /* or something like that */. Now I never need to worry about the order of the value and size.
Michael Burr
stefaanv, of course it would be weird to try it on classes. But dismissing the function because it's not applicable in every single case goes a bit too far :)
Michael Krelin - hacker
+21  A: 

If you really care you should try and measure. However the most portable way is using std::fill():

std::fill( array, array + numberOfElements, 0.0 );
sharptooth
`fill_n` then sounds more appropriate.
Michael Krelin - hacker
IMO the perfect answer. _If_ `std::memset()` is possible, I would expect the std lib implementation to call this, if not, it does the right thing, too.
sbi
@hacker: maybe, although I always assumed that fill_n is intended for use with iterators for which no end iterator exists, rather than in all cases where the length is known. But in this case it certainly does avoid repetition of "array".
Steve Jessop
Did you guys see this is A C question and NOT C++. AFAIK, C doesn't have fill function, much less std namespace
vehomzzz
I swear it was tagged `c++` when it was first published! (not that I would prefer `c++` way here).
Michael Krelin - hacker
@hacker: Enigma retagged it as a C question about 1 minute before complaining about this answer. Which is fine, though his comment strikes me as unjustly aggressive, when you take this into account.
Brian
Brian, agree on both accounts (retagging being fine, but not when coupled with unjust offense).
Michael Krelin - hacker
So I guess the answer to the question "did you guys see this is a C question", is "no, obviously not, because it was a C++ question". Am tempted to change the question to setting a Java array of doubles to 12.4, then complain that all the answers are irrelevant ;-)
Steve Jessop
@onebyone - don't forget to have at least 3 errors inthe Java code you post.
Michael Burr
LOL @ onebyone, Michael Burr :)
j_random_hacker
@sbi: I would expect the std::lib to call `__memset_aligned_8` or so, not `std::memset()`. No point in doing a runtime alignment check on a `char*` if you know that it's just a casted `double*`
MSalters
+6  A: 
memset(d,0,10*sizeof(*d));

is likely to be faster. Like they say you can also

std::fill_n(d,10,0.);

but it is most likely a prettier way to do the loop.

Michael Krelin - hacker
+1 for using count * sizeof, but you should drop the parenthesis too. :)
unwind
Note that, AFAIK, the standard doesn't guarantee that 0.0 is implemented as all bits set to 0. (In fact, all bits set to 0 might be an invalid floating point value that's trapped by the hardware.) So filling double with zeroed bytes is not portable and `std::memset()` shouldn't be used.
sbi
unwind, I like their roundness.
Michael Krelin - hacker
sbi, but we both now that it would work for our purpose ;-) Actually, the thing is — I doubt this is worth optimizing, but *if* for whatever reason we started to do so, then let's optimize it.
Michael Krelin - hacker
I prefer the `fill_n` way, since that one stays in the same context as the `double` variable.
xtofl
xtofl, if we're talking loop vs `fill_n`, I'd certainly vote for `fill_n`. But *if* (**if!**) for whatever reason we want to speed up zeroing of half-a-gigabyte of doubles in the loop, I'd certainly care way less about the context.
Michael Krelin - hacker
@hacker: that seems to be the answer the OP was after.
stefaanv
Although I'm sure that "memset(d, length, 0);" is still much faster by far ;)
stefaanv
stefaanv, pardon my ignorance, but who is OP?;-)
Michael Krelin - hacker
stefaanv, you bet!
Michael Krelin - hacker
@hacker - OP == "Original Poster"
Michael Burr
@unwind - what's the problem with the parens?
Michael Burr
@stefaanv: "Chen's Theorem of Ultimate Optimization": You can do nothing really fast. From http://blogs.msdn.com/oldnewthing/archive/2008/10/14/8998849.aspx#8999794
Michael Burr
Thanks, stefaanv. And with parens — I think unwind prefers it without — `sizeof *d`. A matter of taste.
Michael Krelin - hacker
@hacker: For a decade I have been writing code that had to run on dozens platforms and platform versions, compiled by almost half a dozen compilers or compiler versions being combined with several different std lib implementations. I have seen someone rake through >1.8MLoC for all C-style casts (because on one platform some of them caused a problem leading to crashes) and many other hellish tasks for porting code. One of the hardest lessons was that, whenever someone says "this piece of code will never have to be ported", never ever believe them. Such hard-learned habit sticks like tar.
sbi
@hacker: Note that I'd expect any decent implementation of `std::fill_n` to call `std::memset` if that is possible. But even if it doesn't, if you are zeroing 500MB once, I doubt that the speed difference really matters. And if you're doing this in a tight loop, there's still the code that has to repeatedly _fill_ 500MB with data it has to produce from somewhere -- and that, too, probably makes the zeroing irrelevant.
sbi
sbi, if the saving were irrelevant we wouldn't be here, if we talk about performance, then it is relevant in the scope of our talk. Other than that, what you're saying is valid, but you believe is strict rules and I don't mind breaking them. Having sticky habits should be balanced by being aware of it ;-)
Michael Krelin - hacker
I also don't quite get it why you first talking about portability across all kinds of platforms and then after barely taking a breath talk about assuming the decent implementation of `std::fill_n` ;-)
Michael Krelin - hacker
+1  A: 

Don't forget to compare a properly optimized for loop if you really care about performance.

Some variant of Duff's device if the array is sufficiently long, and prefix --i not suffix i-- (although most compilers will probably correct that automatically.).

Although I'd question if this is the most valuable thing to be optimising. Is this genuinely a bottleneck for the system?

Massif
I think for primitive types it doesn't matter whether it's prefix or suffix (as long as result isn't used), but it's nice to develop an habit of using prefix whenever you don't care about the result — next loop may employ some complicated iterator instead.
Michael Krelin - hacker
+1  A: 

I think you mean

memset(d, 0, length * sizeof(d[0]))

and

for (int i = length; --i >= 0; ) d[i] = 0;

Personally, I do either one, but I suppose std::fill() is probably better.

Mike Dunlavey
is fill better because it is faster? I need faster, not more readable :))))
vehomzzz
I find the latter very cruel and unreadable. Initializing with length and not length-1 and doing the decrement inside the condition.
codymanix
@enigma: Which is faster? I would have to time it, which you can do also. (Just wrap it in a 10^9 loop and look at your watch - seconds = nanoseconds.)
Mike Dunlavey
@codymanix: Cruel? Unreadable? (Sounds like ignorant professor-speak.) It's been done that way for ages, and that's how the processors used to do it also - very clean code.
Mike Dunlavey
@Mike: One thing to be aware of when you loop downwards like in your 2nd snippet is that the loop will run forever if `i` has unsigned type rather than signed as it does here.
j_random_hacker
@j: Absolutely right. Only intelligent coders need apply here.
Mike Dunlavey
+2  A: 

Note that for memset you have to pass the number of bytes, not the number of elements because this is an old C function:

memset(d, 0, sizeof(double)*length);

memset can be faster since it is written in assembler, whereas std::fill is a template function which simply does a loop internally.

But for type safety and more readable code I would recommend std::fill().

codymanix
Your sizeof use is wrong. It should be "sizeof (double)" or just "sizeof *d", but you can't mix the two. You need the parenthesis to get the size of a named type.
unwind
Sorry, there are plenty of implementations that specialize std::fill and std::copy for POD. And because these specializations can assume aligned pointers, they'll likely be faster than std::memset(). Memset has to deal with all edge cases.
MSalters
A: 

If you're required to not use STL...

double aValues [10];
ZeroMemory (aValues, sizeof(aValues));

ZeroMemory at least makes the intent clear.

Bob Moore
Why wouldst thy not use the stl when using C++?
xtofl
@xtofl: It's probably not the case here but for example when using an older Symbian version, there is no STL available.
mxp
I don't think ZeroMemory is available on older Symbian versions either...
Steve Jessop
@onebyone - but ZeroMemory() has to be among the easiest functions to write out there.
Michael Burr
@Michael: Yes, but is this a fast implementation then?
sbi
It's a macro that exapnds to a memset call. I just prefer it because it makes the intent clear, especially to a reader who isn't C-fluent.
Bob Moore
A: 

As an alternative to all stuff proposed, I can suggest you NOT to set array to all zeros at startup. Instead, set up value to zero only when you first access the value in a particular cell. This will stave your question off and may be faster.

Pavel Shved
How would it be faster? If the array truly needs to be zeroed out, then doing it on the fly would require a separate data structure to keep track of which entries are dirty, and lookups in that data structure would almost certainly take longer than just writing the zeros.
If you actually access first 100 cells out of 1 000 000 element array, setting it all to zeros is slower. Especially if instead of that "structure" you just keep track of the lowest uninitialized index.
Pavel Shved
How is maintaining a separate structure faster in terms of maintenance?
jmucchiello
How maintaining one integer number makes you call it a "structure"?
Pavel Shved
+1  A: 

In general the memset is going to be much faster, make sure you get your length right, obviously your example has not (m)allocated or defined the array of doubles. Now if it truly is going to end up with only a handful of doubles then the loop may turn out to be faster. But as get to the point where the fill loop shadows the handful of setup instructions memset will typically use larger and sometimes aligned chunks to maximize speed.

As usual, test and measure. (although in this case you end up in the cache and the measurement may turn out to be bogus).

dwelch
+3  A: 

Try this, if only to be cool xD

{
    double *to = d;
    int n=(length+7)/8;
    switch(length%8){
        case 0: do{ *to++ = 0.0;
        case 7:  *to++ = 0.0;
        case 6:  *to++ = 0.0;
        case 5:  *to++ = 0.0;
        case 4:  *to++ = 0.0;
        case 3:  *to++ = 0.0;
        case 2:  *to++ = 0.0;
        case 1:  *to++ = 0.0;
        }while(--n>0);
    }
}
fortran
Can you do that? Put cases inside a "do"?
Mike Dunlavey
Yes, it's called a Duff's Device: http://en.wikipedia.org/wiki/Duff%27s_device
fbrereto
@fbrereton: I'm snowed. I never realized the "case" could go down inside a subordinate block underneath the "switch".
Mike Dunlavey
yes, it's duff's device, I think other answer mentioned it... about the cases, they're just jump target labels FSMS knows what kind of unstructured tricks could be done with them! :-)
fortran
This probably made you cool when Tom Duff came out with it. Nowadays it probably makes the other programmers hate you. And (both) rightly so.
sbi
@sbi your inability to detect irony is fascinating
fortran
nobody should use this old piece of code. It confues most readers.
frast
@frast it is good to filter out good programmers from bad ones xD
fortran
... at the same time, the general idea can be accomplished without abusing the language so much. (Listen to me - I'm starting to sound judgemental!)
Mike Dunlavey
+4  A: 

In addition to the several bugs and omissions in your code, using memset is not portable. You can't assume that a double with all zero bits is equal to 0.0. First make your code correct, then worry about optimizing.

You can make that assumption if you are using IEEE-754, and there are very few reasons to justify trying to make your floating-point code tolerant of implementations that wildly differ from the IEEE standard. In fact, the only reason I can imagine tolerating less than full compliance is if you are writing gpgpu code targeting a platform that doesn't implement all the rounding modes and NaN handling according to spec.
@unknown: Well, it's usually IEEE-754, but not always. So make sure by including `BOOST_STATIC_ASSERT(numeric_limits<double>::is_iec559);` somewhere in your code. Then things will fail fast if the assumption is wrong.
j_random_hacker
+2  A: 
calloc(length, sizeof(double))

According to IEEE-754, the bit representation of a positive zero is all zero bits, and there's nothing wrong with requiring IEEE-754 compliance. (If you need to zero out the array to reuse it, then pick one of the above solutions).

To make sure your assumption is correct, include `BOOST_STATIC_ASSERT(numeric_limits<double>::is_iec559);` somewhere in your code.
j_random_hacker
+2  A: 

According to this Wikipedia article on IEEE 754-1975 64-bit floating point a bit pattern of all 0s will indeed properly initialize a double to 0.0. Unfortunately your memset code doesn't do that.

Here is the code you ought to be using:

memset(d, 0, length * sizeof(double));

As part of a more complete package...

{
    double *d;
    int length = 10;
    d = malloc(sizeof(d[0]) * length);
    memset(d, 0, length * sizeof(d[0]));
}

Of course, that's dropping the error checking you should be doing on the return value of malloc. sizeof(d[0]) is slightly better than sizeof(double) because it's robust against changes in the type of d.

Also, if you use calloc(length, sizeof(d[0])) it will clear the memory for you and the subsequent memset will no longer be necessary. I didn't use it in the example because then it seems like your question wouldn't be answered.

Omnifarious
Doesn't the 0 need to be the 2nd argument to memset? Doesn't calloc clear what it allocates to zero?
Mike Dunlavey
Oops. I should've been more careful with memset. I'll fix that. You're also right about calloc... That's an interesting observation. I'll change the code to use malloc, then mention calloc as an option.
Omnifarious
However, the more important point is that C/C++ *do not* mandate IEEE 754 floating point! Although it is the most common representation, it is perfectly OK for a conforming compiler to use a different representation where all-bits-off in fact represents -42.69, so this code is not portable.
j_random_hacker
So don't drop the error checking. How hard is it to add "if (d) " in front of the memset. It's just 7 chars. It took you far more than 7 chars to note the lack of error checking. (And don't mention dangling elses, the block ends after the memset.)
jmucchiello
Adding error checking increases the complexity of the example. Clarity is more important than having the code be exactly correct when you're trying to show something.
Omnifarious
+2  A: 

The example will not work because you have to allocate memory for your array. You can do this on the stack or on the heap.

This is an example to do it on the stack:

double d[50] = {0.0};

No memset is needed after that.

frast
Spelling it out: if you supply an initialiser for an aggregate (struct or array), C and C++ will zero-initialise any remaining members.
j_random_hacker
+2  A: 

Assuming the loop length is an integral constant expression, the most probable outcome it that a good optimizer will recognize both the for-loop and the memset(0). The result would be that the assembly generated is essentially equal. Perhaps the choice of registers could differ, or the setup. But the marginal costs per double should really be the same.

MSalters
I tested this with visual c++ and found that "std::fill_n(d, 1000000, 0.0f)" indeed compiles to xor edx, edx mov r8d, 8000000 ; 007a1200H mov rcx, rax call memsetSo for efficiency there really is no difference at all
John Burton
+1, thanks for actually trying this.
MSalters