ansaurus

Question

What's the most efficient way to compare two blocks of memory?

Answer 1

A:

You could try:

check if uint is the largest type that fits in your target CPU (ulongs might fit a native register better)
use 2 pointers of that type
use 2 local variables using *p++ (dont dereference the pointers 2 times for 1 value)
devide the counter of the 1st loop up front (use while (counter--))
unroll the 2nd loop by replacing it with a switch (if sizeof(type that fits in a register) is known and will be always the same.)

Edit: if the first loop is the bottleneck, unrolling might be the answer. Combined with halving the number of conditions in case of equal values, for unrolling 4 times I get something like:

uint* lp = (uint*)lhs;
uint* rp = (uint*)rhs;
uint  l;
uint  r;
int count = (n / uint.sizeof) / 4;

while (count--) {
    if( (l = *lp++) != (r = *rp++) {
        return (l < r) ? -1 : 1;
    }
    if( (l = *lp++) != (r = *rp++) {
        return (l < r) ? -1 : 1;
    }
    if( (l = *lp++) != (r = *rp++) {
        return (l < r) ? -1 : 1;
    }
    if( (l = *lp++) != (r = *rp++) {
        return (l < r) ? -1 : 1;
    }
}

Of course that leaves (n / uint.sizeof) % 4 iterations left to do, which you can mix into this loop by interleaving a switch, I left that as exercise for the reader evil grin.

rsp 2009-11-06 10:25:20

1. I tried using ulongs, it was slower.2. I tried taking out the second dereference. It wasn't any faster, propbably b/c the compiler optimizes to the same thing anyhow, so I put the 2nd deref back because it makes the code more readable.3. I think most of the bottleneck is the first loop.

dsimcha 2009-11-06 14:32:02

Can the person who voted -1 on all answers add a note as to why?

rsp 2009-11-07 08:29:36

rsp I assume its just a troll. I dont think any of these answers deserve a downvote.

acidzombie24 2009-11-08 18:09:39

Answer 2

A:

I think memcmp is specified to do a byte-by-byte comparison, regardless of data type. Are you sure your compiler's implementation preserves string semantics? It shouldn't.

Crashworks 2009-11-06 22:20:01

Answer 3

A:

I dont know much about it but there are vector instructions which can apply instructions to many bytes at once. You can use those results to do a blazing fast memcmp. I dont know which instructions you would need but if you look up Larrabee New Instructions or check out this article you may find what you are looking for http://www.ddj.com/architect/216402188

NOTE: This CPU isnt out ATM AFAIK

-Edit- Right now i am positive there is an instruction sets (try looking at SSE or SSE2) which may compare 16 bytes at once if they are aligned.

You can try this pure c++ code.

template<class T>
int memoryCompare(const T* lhs, const T * rhs, size_t n) {
    const T* endLHS = lhs + n/sizeof(T);
    while(lhs<endLHS) {
        int i = *lhs - *rhs;
        if(i != 0) return i > 0 ? 1 : -1;
        lhs++;
        rhs++;
    }
    //more code for the remaining bytes or call memoryCompare<char>(lhs, rhs, n%sizeof(T));
    return 0;
}

The advantage here is your increasing the pointer so you can dereference it and not apply an index (its ptr_offset[index] vs ptr_offset). The above uses a template so you can use 64 bits on 64 bit machines. and CMPs in assembly are really just subtracts checking the N and Z flags. Instead of comparing N and decreasing N i just compare in my version.

acidzombie24 2009-11-06 22:34:47

It might help if you dereference lhs and rhs :-)

rsp 2009-11-07 17:21:41

oops, heh XD. Fixed.

acidzombie24 2009-11-08 18:06:49

Answer 4

A:

Well a lot depends upon your system and data. There is only so many assumptions we can make. What processor are you using? Does it have to be straight C code? How wide are the CPU registers? What is the cache structure of the CPU? etc etc

It also depends on how different your data is. If it is very unlikely that the first byte from each buffer is the same then the speed of the function is pretty meaningless as, logically, it won't get to the rest of the function. If it is likely that the first n-1 bytes are usually the sme then it becomes more important.

All in you are unlikely to see much change regardless of how you do the test.

Anyway this is a little implementation of my own, it may, or may not, be faster than your own (or, given i just made it up as i went along, it may, or may not, work ;)):

int memoryCompare(const void* lhs, const void* rhs, size_t n) 
{
    uint_64 diff  = 0

    // Test the first few bytes until we are 32-bit aligned.
    while( (n & 0x3) != 0 && diff != 0 )
    {
     diff = (uint_8*)lhs - (uint_8*)rhs;
     n--;
     ((uint_8*)lhs)++;
     ((uint_8*)rhs)++;              
        }

    // Test the next set of 32-bit integers using comparisons with
    // aligned data.
    while( n > sizeof( uint_32 ) && diff != 0 )
    {
     diff = (uint_32*)lhs - (uint_32*)rhs;
     n  -= sizeof( uint_32 );
     ((uint_32*)lhs)++;
     ((uint_32*)rhs)++;
    }

    // now do final bytes.
    while( n > 0 && diff != 0 ) 
        {
     diff = (uint_8*)lhs - (uint_8*)rhs;
     n--;
     ((uint_8*)lhs)++;
     ((uint_8*)rhs)++;  
        }

    return (int)*diff / abs( diff ));
}

Goz 2009-11-06 23:10:00

Answer 5

+1 A:

Does the answer to this question help you at all?

If the compiler has support for implementing memcmp() as an intrinsic/builtin function, it seems that you would have a hard time besting it.

I have to admit to knowing little to nothing about D, so I have no idea whether the D compiler has support for intrinsics.

sdtom 2009-11-06 23:27:03

The OP already determined his alternative is faster then memcmp and asked for further improvements. If his compiler would use an intrinsic this probably would not be the case.

rsp 2009-11-07 08:33:03

ansaurus

tags:

views:

answers:

What's the most efficient way to compare two blocks of memory?

related questions