ansaurus

Question

Which is faster/preferred: memset or for loop to zero out an array of doubles ?

Answer 1

+3 A:

memset(d, 10, 0) is wrong as it only nulls 10 bytes. prefer std::fill as the intent is clearest.

stefaanv 2009-09-03 13:21:12

No, it doesn't null any bytes. It tens 0 bytes.

Michael Krelin - hacker 2009-09-03 13:23:23

another reason to use std::fill

stefaanv 2009-09-03 13:25:15

I do not hold anything against `std::fill`, but how is it *another reason*?

Michael Krelin - hacker 2009-09-03 13:27:06

With memset, I always forget the order of the parameters, which I took over from the question.

stefaanv 2009-09-03 13:36:10

I'd go as far as guessing that you forget the order of the parameters because you don't use it often enough ;-) And now go ahead and tell me you're not using it because you keep forgetting the order of parameters ;-)

Michael Krelin - hacker 2009-09-03 14:05:00

I'm now considering, on principle, never again using any function that takes two integer parameters. Possible exception if the order doesn't matter, so std::min is OK ;-)

Steve Jessop 2009-09-03 14:16:40

I'm not using it anymore because it is too lowlevel to do any real initialization and it is unsafe to initialize classes with it.The only use for it is to initialize a real byte-buffer and you can do that too with std::fill

stefaanv 2009-09-03 14:24:57

@stefaanv: since 99% of the time I call memset() to zero a buffer, I have a nice little inline: inline void zeromem( void* p, size_t siz) { memset( p, 0, siz); } /* or something like that */. Now I never need to worry about the order of the value and size.

Michael Burr 2009-09-03 17:59:14

stefaanv, of course it would be weird to try it on classes. But dismissing the function because it's not applicable in every single case goes a bit too far :)

Michael Krelin - hacker 2009-09-03 19:43:34

Answer 2

+21 A:

If you really care you should try and measure. However the most portable way is using std::fill():

std::fill( array, array + numberOfElements, 0.0 );

sharptooth 2009-09-03 13:21:37

`fill_n` then sounds more appropriate.

Michael Krelin - hacker 2009-09-03 13:25:42

IMO the perfect answer. _If_ `std::memset()` is possible, I would expect the std lib implementation to call this, if not, it does the right thing, too.

sbi 2009-09-03 13:26:40

@hacker: maybe, although I always assumed that fill_n is intended for use with iterators for which no end iterator exists, rather than in all cases where the length is known. But in this case it certainly does avoid repetition of "array".

Steve Jessop 2009-09-03 14:14:03

Did you guys see this is A C question and NOT C++. AFAIK, C doesn't have fill function, much less std namespace

vehomzzz 2009-09-03 14:23:17

I swear it was tagged `c++` when it was first published! (not that I would prefer `c++` way here).

Michael Krelin - hacker 2009-09-03 15:04:08

@hacker: Enigma retagged it as a C question about 1 minute before complaining about this answer. Which is fine, though his comment strikes me as unjustly aggressive, when you take this into account.

Brian 2009-09-03 15:08:57

Brian, agree on both accounts (retagging being fine, but not when coupled with unjust offense).

Michael Krelin - hacker 2009-09-03 15:19:57

So I guess the answer to the question "did you guys see this is a C question", is "no, obviously not, because it was a C++ question". Am tempted to change the question to setting a Java array of doubles to 12.4, then complain that all the answers are irrelevant ;-)

Steve Jessop 2009-09-03 16:13:00

@onebyone - don't forget to have at least 3 errors inthe Java code you post.

Michael Burr 2009-09-03 17:47:08

LOL @ onebyone, Michael Burr :)

j_random_hacker 2009-09-04 03:15:25

@sbi: I would expect the std::lib to call `__memset_aligned_8` or so, not `std::memset()`. No point in doing a runtime alignment check on a `char*` if you know that it's just a casted `double*`

MSalters 2009-09-04 10:55:13

Answer 3

+6 A:

memset(d,0,10*sizeof(*d));

is likely to be faster. Like they say you can also

std::fill_n(d,10,0.);

but it is most likely a prettier way to do the loop.

Michael Krelin - hacker 2009-09-03 13:23:00

+1 for using count * sizeof, but you should drop the parenthesis too. :)

unwind 2009-09-03 13:26:13

Note that, AFAIK, the standard doesn't guarantee that 0.0 is implemented as all bits set to 0. (In fact, all bits set to 0 might be an invalid floating point value that's trapped by the hardware.) So filling double with zeroed bytes is not portable and `std::memset()` shouldn't be used.

sbi 2009-09-03 13:27:49

unwind, I like their roundness.

Michael Krelin - hacker 2009-09-03 13:30:30

sbi, but we both now that it would work for our purpose ;-) Actually, the thing is — I doubt this is worth optimizing, but *if* for whatever reason we started to do so, then let's optimize it.

Michael Krelin - hacker 2009-09-03 13:33:16

I prefer the `fill_n` way, since that one stays in the same context as the `double` variable.

xtofl 2009-09-03 13:34:37

xtofl, if we're talking loop vs `fill_n`, I'd certainly vote for `fill_n`. But *if* (**if!**) for whatever reason we want to speed up zeroing of half-a-gigabyte of doubles in the loop, I'd certainly care way less about the context.

Michael Krelin - hacker 2009-09-03 13:38:06

@hacker: that seems to be the answer the OP was after.

stefaanv 2009-09-03 15:42:30

Although I'm sure that "memset(d, length, 0);" is still much faster by far ;)

stefaanv 2009-09-03 15:43:54

stefaanv, pardon my ignorance, but who is OP?;-)

Michael Krelin - hacker 2009-09-03 15:44:38

stefaanv, you bet!

Michael Krelin - hacker 2009-09-03 16:03:54

@hacker - OP == "Original Poster"

Michael Burr 2009-09-03 17:48:58

@unwind - what's the problem with the parens?

Michael Burr 2009-09-03 17:50:05

@stefaanv: "Chen's Theorem of Ultimate Optimization": You can do nothing really fast. From http://blogs.msdn.com/oldnewthing/archive/2008/10/14/8998849.aspx#8999794

Michael Burr 2009-09-03 17:55:00

Thanks, stefaanv. And with parens — I think unwind prefers it without — `sizeof *d`. A matter of taste.

Michael Krelin - hacker 2009-09-03 19:38:58

@hacker: For a decade I have been writing code that had to run on dozens platforms and platform versions, compiled by almost half a dozen compilers or compiler versions being combined with several different std lib implementations. I have seen someone rake through >1.8MLoC for all C-style casts (because on one platform some of them caused a problem leading to crashes) and many other hellish tasks for porting code. One of the hardest lessons was that, whenever someone says "this piece of code will never have to be ported", never ever believe them. Such hard-learned habit sticks like tar.

sbi 2009-09-04 09:53:34

@hacker: Note that I'd expect any decent implementation of `std::fill_n` to call `std::memset` if that is possible. But even if it doesn't, if you are zeroing 500MB once, I doubt that the speed difference really matters. And if you're doing this in a tight loop, there's still the code that has to repeatedly _fill_ 500MB with data it has to produce from somewhere -- and that, too, probably makes the zeroing irrelevant.

sbi 2009-09-04 09:58:19

sbi, if the saving were irrelevant we wouldn't be here, if we talk about performance, then it is relevant in the scope of our talk. Other than that, what you're saying is valid, but you believe is strict rules and I don't mind breaking them. Having sticky habits should be balanced by being aware of it ;-)

Michael Krelin - hacker 2009-09-04 10:38:04

I also don't quite get it why you first talking about portability across all kinds of platforms and then after barely taking a breath talk about assuming the decent implementation of `std::fill_n` ;-)

Michael Krelin - hacker 2009-09-04 10:40:09

Answer 4

+1 A:

Don't forget to compare a properly optimized for loop if you really care about performance.

Some variant of Duff's device if the array is sufficiently long, and prefix --i not suffix i-- (although most compilers will probably correct that automatically.).

Although I'd question if this is the most valuable thing to be optimising. Is this genuinely a bottleneck for the system?

Massif 2009-09-03 13:24:57

I think for primitive types it doesn't matter whether it's prefix or suffix (as long as result isn't used), but it's nice to develop an habit of using prefix whenever you don't care about the result — next loop may employ some complicated iterator instead.

Michael Krelin - hacker 2009-09-03 13:29:12

Answer 5

+1 A:

I think you mean

memset(d, 0, length * sizeof(d[0]))

and

for (int i = length; --i >= 0; ) d[i] = 0;

Personally, I do either one, but I suppose std::fill() is probably better.

Mike Dunlavey 2009-09-03 13:26:19

is fill better because it is faster? I need faster, not more readable :))))

vehomzzz 2009-09-03 13:39:02

I find the latter very cruel and unreadable. Initializing with length and not length-1 and doing the decrement inside the condition.

codymanix 2009-09-03 15:01:57

@enigma: Which is faster? I would have to time it, which you can do also. (Just wrap it in a 10^9 loop and look at your watch - seconds = nanoseconds.)

Mike Dunlavey 2009-09-03 16:24:11

@codymanix: Cruel? Unreadable? (Sounds like ignorant professor-speak.) It's been done that way for ages, and that's how the processors used to do it also - very clean code.

Mike Dunlavey 2009-09-03 16:38:13

@Mike: One thing to be aware of when you loop downwards like in your 2nd snippet is that the loop will run forever if `i` has unsigned type rather than signed as it does here.

j_random_hacker 2009-09-05 12:38:40

@j: Absolutely right. Only intelligent coders need apply here.

Mike Dunlavey 2009-09-05 18:07:59

Answer 6

+2 A:

Note that for memset you have to pass the number of bytes, not the number of elements because this is an old C function:

memset(d, 0, sizeof(double)*length);

memset can be faster since it is written in assembler, whereas std::fill is a template function which simply does a loop internally.

But for type safety and more readable code I would recommend std::fill().

codymanix 2009-09-03 13:29:08

Your sizeof use is wrong. It should be "sizeof (double)" or just "sizeof *d", but you can't mix the two. You need the parenthesis to get the size of a named type.

unwind 2009-09-03 13:42:44

Sorry, there are plenty of implementations that specialize std::fill and std::copy for POD. And because these specializations can assume aligned pointers, they'll likely be faster than std::memset(). Memset has to deal with all edge cases.

MSalters 2009-09-04 10:52:05

Answer 7

A:

If you're required to not use STL...

double aValues [10];
ZeroMemory (aValues, sizeof(aValues));

ZeroMemory at least makes the intent clear.

Bob Moore 2009-09-03 13:31:37

Why wouldst thy not use the stl when using C++?

xtofl 2009-09-03 13:35:40

@xtofl: It's probably not the case here but for example when using an older Symbian version, there is no STL available.

mxp 2009-09-03 13:49:10

I don't think ZeroMemory is available on older Symbian versions either...

Steve Jessop 2009-09-03 14:17:26

@onebyone - but ZeroMemory() has to be among the easiest functions to write out there.

Michael Burr 2009-09-04 00:21:26

@Michael: Yes, but is this a fast implementation then?

sbi 2009-09-04 10:05:36

It's a macro that exapnds to a memset call. I just prefer it because it makes the intent clear, especially to a reader who isn't C-fluent.

Bob Moore 2009-09-04 11:28:59

Answer 8

A:

As an alternative to all stuff proposed, I can suggest you NOT to set array to all zeros at startup. Instead, set up value to zero only when you first access the value in a particular cell. This will stave your question off and may be faster.

Pavel Shved 2009-09-03 13:39:50

How would it be faster? If the array truly needs to be zeroed out, then doing it on the fly would require a separate data structure to keep track of which entries are dirty, and lookups in that data structure would almost certainly take longer than just writing the zeros.

2009-09-03 23:39:56

If you actually access first 100 cells out of 1 000 000 element array, setting it all to zeros is slower. Especially if instead of that "structure" you just keep track of the lowest uninitialized index.

Pavel Shved 2009-09-04 02:18:06

How is maintaining a separate structure faster in terms of maintenance?

jmucchiello 2009-09-04 05:17:37

How maintaining one integer number makes you call it a "structure"?

Pavel Shved 2009-09-04 15:11:06

Answer 9

+1 A:

In general the memset is going to be much faster, make sure you get your length right, obviously your example has not (m)allocated or defined the array of doubles. Now if it truly is going to end up with only a handful of doubles then the loop may turn out to be faster. But as get to the point where the fill loop shadows the handful of setup instructions memset will typically use larger and sometimes aligned chunks to maximize speed.

As usual, test and measure. (although in this case you end up in the cache and the measurement may turn out to be bogus).

dwelch 2009-09-03 13:42:14

Answer 10

+3 A:

Try this, if only to be cool xD

{
    double *to = d;
    int n=(length+7)/8;
    switch(length%8){
        case 0: do{ *to++ = 0.0;
        case 7:  *to++ = 0.0;
        case 6:  *to++ = 0.0;
        case 5:  *to++ = 0.0;
        case 4:  *to++ = 0.0;
        case 3:  *to++ = 0.0;
        case 2:  *to++ = 0.0;
        case 1:  *to++ = 0.0;
        }while(--n>0);
    }
}

fortran 2009-09-03 14:40:04

Can you do that? Put cases inside a "do"?

Mike Dunlavey 2009-09-03 16:49:29

Yes, it's called a Duff's Device: http://en.wikipedia.org/wiki/Duff%27s_device

fbrereto 2009-09-03 17:37:26

@fbrereton: I'm snowed. I never realized the "case" could go down inside a subordinate block underneath the "switch".

Mike Dunlavey 2009-09-03 19:14:10

yes, it's duff's device, I think other answer mentioned it... about the cases, they're just jump target labels FSMS knows what kind of unstructured tricks could be done with them! :-)

fortran 2009-09-04 08:08:40

This probably made you cool when Tom Duff came out with it. Nowadays it probably makes the other programmers hate you. And (both) rightly so.

sbi 2009-09-04 09:59:43

@sbi your inability to detect irony is fascinating

fortran 2009-09-04 10:22:54

nobody should use this old piece of code. It confues most readers.

frast 2009-09-04 10:43:40

@frast it is good to filter out good programmers from bad ones xD

fortran 2009-09-04 10:47:10

... at the same time, the general idea can be accomplished without abusing the language so much. (Listen to me - I'm starting to sound judgemental!)

Mike Dunlavey 2009-09-04 18:27:41

Answer 11

+4 A:

In addition to the several bugs and omissions in your code, using memset is not portable. You can't assume that a double with all zero bits is equal to 0.0. First make your code correct, then worry about optimizing.

2009-09-03 14:56:27

You can make that assumption if you are using IEEE-754, and there are very few reasons to justify trying to make your floating-point code tolerant of implementations that wildly differ from the IEEE standard. In fact, the only reason I can imagine tolerating less than full compliance is if you are writing gpgpu code targeting a platform that doesn't implement all the rounding modes and NaN handling according to spec.

2009-09-03 15:04:50

@unknown: Well, it's usually IEEE-754, but not always. So make sure by including `BOOST_STATIC_ASSERT(numeric_limits<double>::is_iec559);` somewhere in your code. Then things will fail fast if the assumption is wrong.

j_random_hacker 2009-09-04 03:34:37

Answer 12

+2 A:

calloc(length, sizeof(double))

According to IEEE-754, the bit representation of a positive zero is all zero bits, and there's nothing wrong with requiring IEEE-754 compliance. (If you need to zero out the array to reuse it, then pick one of the above solutions).

2009-09-03 15:00:35

To make sure your assumption is correct, include `BOOST_STATIC_ASSERT(numeric_limits<double>::is_iec559);` somewhere in your code.

j_random_hacker 2009-09-04 03:35:41

Answer 13

+2 A:

According to this Wikipedia article on IEEE 754-1975 64-bit floating point a bit pattern of all 0s will indeed properly initialize a double to 0.0. Unfortunately your memset code doesn't do that.

Here is the code you ought to be using:

memset(d, 0, length * sizeof(double));

As part of a more complete package...

{
    double *d;
    int length = 10;
    d = malloc(sizeof(d[0]) * length);
    memset(d, 0, length * sizeof(d[0]));
}

Of course, that's dropping the error checking you should be doing on the return value of malloc. sizeof(d[0]) is slightly better than sizeof(double) because it's robust against changes in the type of d.

Also, if you use calloc(length, sizeof(d[0])) it will clear the memory for you and the subsequent memset will no longer be necessary. I didn't use it in the example because then it seems like your question wouldn't be answered.

Omnifarious 2009-09-03 15:40:34

Doesn't the 0 need to be the 2nd argument to memset? Doesn't calloc clear what it allocates to zero?

Mike Dunlavey 2009-09-03 16:45:19

Oops. I should've been more careful with memset. I'll fix that. You're also right about calloc... That's an interesting observation. I'll change the code to use malloc, then mention calloc as an option.

Omnifarious 2009-09-03 17:30:25

However, the more important point is that C/C++ *do not* mandate IEEE 754 floating point! Although it is the most common representation, it is perfectly OK for a conforming compiler to use a different representation where all-bits-off in fact represents -42.69, so this code is not portable.

j_random_hacker 2009-09-04 03:27:28

So don't drop the error checking. How hard is it to add "if (d) " in front of the memset. It's just 7 chars. It took you far more than 7 chars to note the lack of error checking. (And don't mention dangling elses, the block ends after the memset.)

jmucchiello 2009-09-04 05:15:15

Adding error checking increases the complexity of the example. Clarity is more important than having the code be exactly correct when you're trying to show something.

Omnifarious 2009-09-04 16:31:24

Answer 14

+2 A:

The example will not work because you have to allocate memory for your array. You can do this on the stack or on the heap.

This is an example to do it on the stack:

double d[50] = {0.0};

No memset is needed after that.

frast 2009-09-04 10:40:29

Spelling it out: if you supply an initialiser for an aggregate (struct or array), C and C++ will zero-initialise any remaining members.

j_random_hacker 2009-09-05 12:44:39

Answer 15

+2 A:

Assuming the loop length is an integral constant expression, the most probable outcome it that a good optimizer will recognize both the for-loop and the memset(0). The result would be that the assembly generated is essentially equal. Perhaps the choice of registers could differ, or the setup. But the marginal costs per double should really be the same.

MSalters 2009-09-04 11:03:37

I tested this with visual c++ and found that "std::fill_n(d, 1000000, 0.0f)" indeed compiles to xor edx, edx mov r8d, 8000000 ; 007a1200H mov rcx, rax call memsetSo for efficiency there really is no difference at all

John Burton 2010-01-04 11:54:03

+1, thanks for actually trying this.

MSalters 2010-01-04 13:56:19

ansaurus

tags:

views:

answers:

Which is faster/preferred: memset or for loop to zero out an array of doubles ?

related questions