ansaurus

Question

fastest way to check if memory is zeroed

Answer 1

+6 A:

How long is 'a while'? ... I'd say attempting to compare as many values in parallel as possible will help, maybe use some SIMD instructions to compare more than 4 bytes at a time?

Do keep in mind though, that no matter how fast you make the comparison, ultimately the data still needs to be read from the file. If the file is not already in a cache somewhere in memory, then you may be limited to in the order of 100-150 MB/s at a maximum before the bandwidth of the disk is saturated. If you have already hit this point, then you may first need to look at an approach that avoids having to load the file, or just accept the fact that it's not going to be faster than that.

jerryjvl 2009-06-01 04:50:16

hmmm, i might try and change the array from char to int then use the xor method suggested below. But yeah the file read time is a limiting factor (this is recovery code i.e. after a crash) and needs to read the file

Lodle 2009-06-01 04:53:25

In that case I'd say once you get close to 8 seconds per gig or so that further improvements are going to be drowned out by the disk bandwidth.

jerryjvl 2009-06-01 04:55:06

The time taken to loop thru memory and checking bytes for 0 values is going to be insigificant compared to the time taken to do the IO. The time the CPU takes to read the bytes is nothing in the end.

mP 2009-06-01 05:21:25

Answer 2

+1 A:

Are there places in the file/chunk where it is more likely to have non-zero values? You only have to find one non-zero value (your break condition), so look in places first where you most probably find them - which doesn't have to be the beginning of a file/chunk. It might make sense to start at the end, or check the 1/3 in the middle, depending on the actual application.

However, I would not recommend to jump randomly to different positions; reading from disk might become incredibly ;) ..

beef2k 2009-06-01 05:14:05

Answer 3

A:

I'd like to see the assembly output for this function. Something that you could do that would speed it up by a lot is to use SSE instructions. With these instructions, you can load 8 bytes at a time, check them all for zero and continue. And you could unroll that for loop a few times too.

toto 2009-06-01 05:24:33

Answer 4

A:

You algo seems OK, but you can heuristically optimise starting place if you known in advance what type of file you will be getting...but then again if it is a specific file, most likely the info will be in the header (first few bytes).

Also make sure that block->size is not 1 from whoever calls the method :)

Also check out Boost's memory mapped files facilities...It might be of a help, depending on how you calculate the optimal block->size

Futurist 2009-06-01 05:24:56

Answer 5

A:

I have an "out of the box" answer for you, but I am not sure how feasable it is to implement in your situation.

If you don't control the dumping process: Since it is a large recovery (dump?) file produced on excpetional case, why not scan the file (for 0 bytes) on low priority right after it is dumped then and mark it somehow for faster later identification? (or you could zip it and parse/scan the zip file later)

Or if you control the dumping process: (a slow process you have to do anyways) why not indicate at the end of the dump file (or go back and write at the beginning of it), if the dump file is filled with 0 or has some valid data (since you wrote it and you know what is in it)? like that you don't have to pay for I/O overhead twice.

The goal here being to make the reading much faster by deffering the procss to another eralier time, since when the dump happens, there is unlikely to be an operator waiting for it to load.

Futurist 2009-06-01 06:01:58

The data from this file is downloaded from the web in chunks and not in order. So thus if a crash happens (or its stopped and resumed later) need to make sure the data is valid in the whole chunk or re download it.

Lodle 2009-06-01 06:04:21

But the only reason you're downloading the whole crash file (in chucnks) is to check if it is zero-filled? if so, can't you run a monitoring process on that machine (hosting the server) and later just do a query to see if the dump file contained data?

Futurist 2009-06-01 06:07:34

Its not a crash file, its a data file and the reason for the chunks is that all of the file is not needed thus on the chunks i need are downloaded.

Lodle 2009-06-01 06:24:06

All right, then check the Boost memory mapped facilities to load the data chunks, and maybe get the sending app to send you the CRC of the chunk as well if you can to make sure data is not corrupt. Btw, using SSE will not help you much as your bottleneck is really the I/O. Scanning several MB of memory even using byte ptr should be fairly fast.

Futurist 2009-06-01 06:33:42

Answer 6

A:

First, don't allocate a new buffer each time. Allocate one (per thread) and reuse it. Use a nice big chunk, and do multiple read/check passes.
Second, don't compare each character. Do comparisons on a larger integral type. Most likely you will want a 32bit int, but depending on your os/compiler it might be faster to use a 64 or even 128 bit int. With a 32bit int you are reducing your number of comparisons by 4x. Your will, of course, have to worry about end conditions. For this, it is easy, if the buffer you are comparing isn't a even multiple of your int size, just set the last X bytes to 0 before you do the compare. Third, it might help your compiler a bit to unroll the loop. Do 4 or 8 comparisons in the body of the loop. This should help the compiler optimize a bit as well as reduce the number of comparisons for exiting the loop. Make sure your buffer is a multiple of your comparison type x the number of comparisons in the loop. Fourth, it may be faster to use (*pBuffer++) instead of buffer[i], especially if the buffer is big.

For any of these, you will of course, want to get some metrics and see what actually helps.

Dolphin 2009-06-01 16:12:35

Answer 7

A:

I will tell you a dirty, non-portable and difficult way, but than can be more efficient... If you are dealing with sparse files, you are really bored and want to mess with the internals of the filesystems you're using, you can try to add a new function that returns you a bitmap indicating which blocks are mapped and which ones aren't (those which are not mapped are zeroed, for the rest you will have still to check manually).

Yeah, I know it's crazy and nobody would ever like to do something like this xD

fortran 2009-06-01 22:28:59

ansaurus

tags:

views:

answers:

fastest way to check if memory is zeroed

related questions