views:

1708

answers:

11

Is it possible to compare whole memory regions in a single processor cycle? More precisely is it possible to compare two strings in one processor cycle using some sort of MMX assembler instruction? Or is strcmp-implementation already based on that optimization?

EDIT: Or is it possible to instruct c++ compiler to remove string duplicates, so that strings can be compared simply by their memory location? Instead of memcmp(a,b) compared by a==b (assuming that a and b are both native const char* strings).

+3  A: 

It depends on how much preprocessing you do. C# and Java both have a process called interning strings which makes every string map to the same address if they have the same contents. Assuming a process like that, you could do a string equality comparison with one compare instruction.

Ordering is a bit harder.

EDIT: Obviously this answer is sidestepping the actual issue of attempting to do a string comparison within a single cycle. But it's the only way to do it unless you happen to have a sequence of instructions that can look at an unbounded amount of memory in constant time to determine the equivalent of a strcmp. Which is improbable, because if you had such an architecture the person who sold it to you would say "Hey, here's this awesome instruction that can do a string compare in a single cycle! How awesome is that?" and you wouldn't need to post a question on stackoverflow.

But that's just my reasoned opinion.

MSN
+1  A: 

Even if both strings were cached, it wouldn't be possible to compare (arbitrarily long) strings in a single processor cycle. The implementation of strcmp in a modern compiler environment should be pretty much optimized, so you shouldn't bother to optimize too much.

EDIT (in reply to your EDIT):

  1. You can't instruct the compiler to unify ALL duplicate strings - most compilers can do something like this, but it's best-effort only (and I don't know any compiler where it works across compilation units).

  2. You might get better performance by adding the strings to a map and comparing iterators after that... the comparison itself might be one cycle (or not much more) then

  3. If the set of strings to use is fixed, use enumerations - that's what they're there for.

hjhill
Ok, but do you think that the following function: __inline bool test(string a){return a=="x";} test("x");will compile into "true"? Even if I use string-class instead of const char*? I should probably test it myself...
AareP
I don't think so - even the most optimizing compiler wouldn't look THAT far into the implementation of the string class. And with const char* == isn't guaranteed to give the correct result in this case...
hjhill
+1  A: 

Assuming you mean x86 ... Here is the Intel documentation.

But off the top of my head, no, I don't think you can compare more than the size of a register at a time.

Out of curiosity, why do you ask? I'm the last to invoke Knuth prematurely, but ... strcmp usually does a pretty good job.

Edit: Link now points to the modern documentation.

Adrien
I'm too lazy to create enumerations and want to use strings in low-level functions that should be pretty fast. It hurts me to see code like: if(input=="someString1")... if(input=="someString2")... if(input=="someString3")... that is executed every frame :)
AareP
If you're really bent on using assembly code, check out the CMPS instruction in the manual. But I'd encourage you explore the mapping and hashing methods recommended in some of the other answers first.
Steve K
+4  A: 

Or is it possible to instruct c++ compiler to remove string duplicates, so that strings can be compared simply by their memory location?

No. The compiler may remove duplicates internally, but I know of no compiler that guarantees or provides facilities for accessing such an optimisation (except possibly to turn it off). Certainly the C++ standard has nothing to say in this area.

anon
+2  A: 

It's not possible to perform general-purpose string operations in one cycle, but there are many optimizations you can apply with extra information.

  • If your problem domain allows the use of an aligned, fixed-size buffer for strings that fits in a machine register, you can perform single-cycle comparisons (not counting the load instructions).
  • If you always keep track of the lengths of your strings, you can compare lengths and use memcmp, which is faster than strcmp. If your application is multi-cultural, keep in mind that this only works for ordinal string comparison.
  • It appears you are using C++. If you only need equality comparisons with immutable strings, you can use a string interning solution (copy/paste link since I'm a new user) to guarantee that equal strings are stored at the same memory location, at which point you can simply compare pointers. en.wikipedia.org/wiki/String_interning
  • Also, take a look at the Intel Optimization Reference Manual, Chapter 10 for details on the SSE 4.2's instructions for text processing. www.intel.com/products/processor/manuals/

Edit: If your problem domain allows the use of an enumeration, that is your single-cycle comparison solution. Don't fight it.

280Z28
+2  A: 

If you're optimizing for string comparisons, you may want to employ a string table (then you only need to compare the indexes of the two strings, which can be done in a single machine instruction).

If that's not feasible, you can also create a hashed string object that contains the string and a hash. Then most of the time you only have to compare the hashes if the strings aren't equal. If the hashes do match you'll have to do a full comparison though to make sure it wasn't a false positive.

patros
+1  A: 

You can certainly compare more than one byte in a cycle. If we take the example of x86-64, you can compare up to 64-bits (8 bytes) in a single instruction (cmps), this isn't necessarily one cycle but will normally be in the low single digits (the exact speed depends on the specific processor version).

However, this doesn't mean you'll be able to all the work of comparing two arrays in memory much faster than strcmp :-

  1. There's more than just the compare - you need to compare the two values, check if they are the same, and if so move to next chunk.
  2. Most strcmp implementations will already be highly optimised, including checking if a and b point to the same address, and any suitable instruction-level optimisations.

Unless you're seeing alot of time spent in strcmp, I wouldn't worry about it - have you got a specific problem / use case you are trying to improve?

Dave Rigby
Then it would make sense to use pragma pack 8 and compare strings 8-bytes at a time, since it will quickly skip strings that are not equal. At least in my case it's quite rare for strings to have same 8 characters in the beginning.
AareP
@AareP: Well you would need to make sure that there are 8 bytes of data available to be loaded - the string is an unknown, arbitrary-length, and so you just blindly loaded 8 bytes you might end up reading off the end of the string and segfaulting.
Dave Rigby
+3  A: 

Not really. Your typical 1-byte compare instruction takes 1 cycle. Your best bet would be to use the MMX 64-bit compare instructions( see this page for an example). However, those operate on registers, which have to be loaded from memory. The memory loads will significantly damage your time, because you'll be going out to L1 cache at best, which adds some 10x time slowdown*. If you are doing some heavy string processing, you can probably get some nifty speedup there, but again, it's going to hurt.

Other people suggest pre-computing strings. Maybe that'll work for your particular app, maybe it won't. Do you have to compare strings? Can you compare numbers?

Your edit suggests comparing pointers. That's a dangerous situation unless you can specifically guarantee that you won't be doing substring compares(ie, you are comparing some two byte strings: [0x40, 0x50] with [0x40, 0x42]. Those are not "equal", but a pointer compare would say they are).

Have you looked at the gcc strcmp() source? I would suggest that doing that would be the ideal starting place.

* Loosely speaking, if a cycle takes 1 unit, a L1 hit takes 10 units, an L2 hit takes 100 units, and an actual RAM hit takes really long.

Paul Nathan
Yes, in my case replacing strings with enumerations would fix the problem. But then I would have to modify my strings by adding prefixes (or postfixes) since those string-names are already reserved by some other code.
AareP
@aarep: Can't you put the enums into their own class/namespace? Are the IDs used by other code?
Paul Nathan
Yeah, will probably have to use namespaces. Just trying to keep code syntax as simple as possible.
AareP
@Paul: The GLIBC strcmp() implementation is the naive one: read successive bytes until they differ. Compiled with gcc-4 -O2, it uses the pcmpeqb and pmovmskb instructions. I don't see how it could be any more efficient.
greyfade
@grey: I'd suspect that more efficiency would come from threading, where one thread "runs ahead" and precomputes the result. Which starts moving towards a kind of "Tomasulo" approach....not sure that this particular problem needs that kind of heavyweight assault.
Paul Nathan
See my remark above, a 16 byte readahead is not possible because the memory after #0 might not be accessable. You would need the length first, but C strings don't work that way (or require an extra pass).It could be that it is an attempt to reduce the number of compare from 2 (one for \0 one for equality) to 1 1/16 (one for \0 and 16 at once) or so. But I doubt it delivers much.
Marco van de Voort
As per my reply comment above, the memory after \0 can only have different protection if it is in a different page, and since the read-aheads are always 16-byte aligned they will never read past the end of the page that \0 lives in.
caf
+5  A: 

You can use the Boost Flyweight library to intern your immutable strings. String equality/inequality tests then become very fast since all it has to do at that point is compare pointers (pun not intended).

Ferruccio
Argh. I say the same thing I get an idiot downvote? Same minus link to boost, so I guess that's why :)
MSN
@MSN 4k points are not enough? Or is it because upvoted user in this occasion had 13k? :)
AareP
+13  A: 

Just use the standard C strcmp() or C++ std::string::operator==() for your string comparisons.

The implementations of them are reasonably good and are probably compiled to a very highly optimized assembly that even talented assembly programmers would find challenging to match.

For example, GNU GLIBC's implementation of strcmp():

int
strcmp (p1, p2)
     const char *p1;
     const char *p2;
{
  register const unsigned char *s1 = (const unsigned char *) p1;
  register const unsigned char *s2 = (const unsigned char *) p2;
  unsigned reg_char c1, c2;

  do
    {
      c1 = (unsigned char) *s1++;
      c2 = (unsigned char) *s2++;
      if (c1 == '\0')
        return c1 - c2;
    }
  while (c1 == c2);

  return c1 - c2;
}

... Compiled with GCC 4, with -O2 optimization on x86_64:

0000000000000000 <strlen>:
 0:   66 0f ef d2             pxor   %xmm2,%xmm2
 4:   48 89 f9                mov    %rdi,%rcx
 7:   49 89 f8                mov    %rdi,%r8
 a:   48 83 e7 f0             and    $0xfffffffffffffff0,%rdi
 e:   66 0f 6f ca             movdqa %xmm2,%xmm1
12:   66 0f 74 17             pcmpeqb (%rdi),%xmm2
16:   83 ce ff                or     $0xffffffffffffffff,%esi
19:   48 29 f9                sub    %rdi,%rcx
1c:   d3 e6                   shl    %cl,%esi
1e:   66 0f d7 d2             pmovmskb %xmm2,%edx
22:   21 f2                   and    %esi,%edx
24:   75 15                   jne    3b <strlen+0x3b>
26:   66 0f 6f 47 10          movdqa 0x10(%rdi),%xmm0
2b:   48 8d 7f 10             lea    0x10(%rdi),%rdi
2f:   66 0f 74 c1             pcmpeqb %xmm1,%xmm0
33:   66 0f d7 d0             pmovmskb %xmm0,%edx
37:   85 d2                   test   %edx,%edx
39:   74 eb                   je     26 <strlen+0x26>
3b:   4c 29 c7                sub    %r8,%rdi
3e:   0f bc c2                bsf    %edx,%eax
41:   48 01 f8                add    %rdi,%rax
44:   c3                      retq

... Which reads each string 16 bytes at a time (movdqa) on an aligned boundary (line a), comparing (pcmpeqb), and stopping at any inequality or the NUL byte (33-39); which is about the most efficient this operation can be.

So don't sweat the small stuff. I'd suggest looking at optimizing other parts of your code.

greyfade
Awesome! I always thought about string compare using SSE, but never actually see it implemented!
LiraNuna
Iwonder how this works safely? It can read 15 bytes beyond \0, which might not be mapped?
Marco van de Voort
The `shl %cl,%esi ... and %esi,%edx` adjusts the result bit mask based on the bytes preceding the `\0`, which is discovered by `pcmpeqb (%rdi),%xmm2`. ... I think.
greyfade
Marco: The 16 byte alignment is the key there - since the reads are always aligned to 16 byte boundaries they can therefore never cross a page boundary.
caf
+1 for *just use standard implementations* and *don't sweat the small stuff*
Steve Schnepp
A: 

Here's one solution that uses enum-like values instead of strings. It supports enum-value-inheritance and thus supports comparison similar to substring comparison. It also uses special character "¤" for naming, to avoid name collisions. You can take any class, function, or variable name and make it into enum-value (SomeClassA will become ¤SomeClassA).

struct MultiEnum
{
    vector<MultiEnum*> enumList;
    MultiEnum()
    {
        enumList.push_back(this);
    }
    MultiEnum(MultiEnum& base)
    {
        enumList.assign(base.enumList.begin(),base.enumList.end());
        enumList.push_back(this);
    }
    MultiEnum(const MultiEnum* base1,const MultiEnum* base2)
    {
        enumList.assign(base1->enumList.begin(),base1->enumList.end());
        enumList.assign(base2->enumList.begin(),base2->enumList.end());
    }
    bool operator !=(const MultiEnum& other)
    {
        return find(enumList.begin(),enumList.end(),&other)==enumList.end();
    }
    bool operator ==(const MultiEnum& other)
    {
        return find(enumList.begin(),enumList.end(),&other)!=enumList.end();
    }
    bool operator &(const MultiEnum& other)
    {
        return find(enumList.begin(),enumList.end(),&other)!=enumList.end();
    }
    MultiEnum operator|(const MultiEnum& other)
    {
        return MultiEnum(this,&other);
    }
    MultiEnum operator+(const MultiEnum& other)
    {
        return MultiEnum(this,&other);
    }
};

MultiEnum 
    ¤someString,
    ¤someString1(¤someString),  // link to "someString" because it is a substring of "someString1"
    ¤someString2(¤someString);


void Test()
{
    MultiEnum a = ¤someString1|¤someString2;
    MultiEnum b = ¤someString1;

    if(a!=¤someString2){}
    if(b==¤someString2){}
    if(b&¤someString2){}
    if(b&¤someString){}  // will result in true, because someString is substring of someString1
}

PS. I had definitely too much free time on my hands this morning, but reinventing the wheel is just too much fun sometimes... :)

AareP
Paul Nathan
This whole solution was proof of concept, written just for fun of it. Haven't looked at it since that morning I wrote it. :)
AareP