ansaurus

Question

Why does strcmp() return 0 when its inputs are equal?

Answer 1

+10 A:

strcmp returns a lexical difference (or should i call it "short-circuit serial byte comparator" ? :-) ) of the two strings you have given as parameters. 0 means that both strings are equal

A positive value means that s1 would be after s2 in a dictionary.

A negative value means that s1 would be before s2 in a dictionary.

Hence your non-zero value when comparing "time" and "money" which are obviously different, even though one would say that time is money ! :-)

Benoît 2009-02-27 16:25:40

strcmp() does not do a lexical comparison, it merely compares the value of each character until there is a difference or both strings terminate.

Ferruccio 2009-02-27 16:35:04

I could swear I've read docs that said strcmp() does a lexical comparison, but some quick googling suggests Ferruccio is correct... could it have changed at some point?

rmeador 2009-02-27 16:38:22

Nope. Always that way :-)

Ferruccio 2009-02-27 16:42:48

Hmmm I'm pretty sure it's a lexical comparison !http://www.cplusplus.com/reference/clibrary/cstring/strcmp.html

Benoît 2009-02-27 16:52:00

I think it changes with the given definition of "lexical". I too have seen strcmp() referred to as being a 'lexical' comparator, though obviously it does not take different localizations, collations, etc. into account. Maybe it is better to call it a "short-circuit serial byte comparator."

aib 2009-02-27 16:52:22

It simply uses each character's ASCII value for comparison which are ordered like: ABCDE...abcde. A lexical comparison would order them like: AaBbCcDdEe.

Ferruccio 2009-02-27 17:00:03

@Ferruccio I updated my answer. But you'll admit that the idea behind the comparison is lexical ! :-)

Benoît 2009-02-27 17:02:20

there is a strcoll. i think that one takes the current locale into account

Johannes Schaub - litb 2009-02-27 17:11:50

@litb, right it does, but for some strange reasons it falls back to strcmp if the locale is set to 'C' or 'POSIX' and not to strcasecmp, which would be more appropriate.

quinmars 2009-02-28 00:16:11

Answer 2

+2 A:

You seem to want strcmp to work like a (hypothetical)

int isEqual(const char *, const char *)

To be sure that would be true to the "zero is false" interpretation of integer results, but it would complicate the logic of sorting because, having established that the two strings were not the same, you would still need to learn which came "earlier".

Moreover, I suspect that a common implementation looks like

int strcmp(const char *s1, const char *s2){
   const unsigned char *q1=s1, *q2=s2;
   while ((*q1 == *q2) && *q1){ 
      ++q1; ++q2; 
   };
   return (*q1 - *q2);
}

which is [edit: kinda] elegant in a K&R kind of way. The important point here (which is increasingly obscured by getting the code right (evidently I should have left well enough alone)) is the way the return statement:

   return (*q1 - *q2);

which gives the results of the comparison naturally in terms of the character values.

dmckee 2009-02-27 16:53:25

Wouldn't that "common implementation" walk off the buffer (past the null termination?) I don't see how it would ever return 0...

Daniel LeCheminant 2009-02-27 17:05:38

i think u gotta have to terminate the loop if you come to \0 :) u also have to watch about underflows in *s1 - *s2 so s1=-127, s2=2 => oops :)

Johannes Schaub - litb 2009-02-27 17:06:03

@Daniel: Uh. Yes. Moment while I fix this...

dmckee 2009-02-27 17:07:31

oh wait. no won't cause an underflow because both are first converted to an int.

Johannes Schaub - litb 2009-02-27 17:09:11

Daniel LeCheminant 2009-02-27 17:10:17

@Daniel: 'cause I was trying to be quick...

dmckee 2009-02-27 17:12:06

i've looked up in c99 it says both are first converted to unsigned char before being subtracted.so there can't even be a case where s1=-127 :) but it changes some: *a - *b could become -2 with char (*a=-1, *b=1) and +254 with unsigned char.so i believe a cast should be added there. kind of tricky :)

Johannes Schaub - litb 2009-02-27 17:16:29

@libt: The Mac OS X manpage agrees, but putting in the casts is going to play merry havoc with my allegidly "elegant" implementation. ::sigh::

dmckee 2009-02-27 17:21:42

indeed :) i like how you added "kinda". so one still can see the a - b pattern in it without scary casts in between :) all fine i would say

Johannes Schaub - litb 2009-02-27 17:27:58

I would think return (int)*q1 - (int)*q2; would solve the sign issue. What makes the casts scary?

jmucchiello 2009-02-27 17:38:27

@joe_mucchiello: They would clutter up the one line I really wanted to emphasize in this instance...

dmckee 2009-02-27 17:40:37

joe_micchiello, they would not solve it. they're cast (promoted) to int anyway automatically. you will have to do (unsigned char)*q1 - (unsigned char)*q2; i think that really clutters the line. now anyway that they could just look into the comments and see how they could write it in non-teaching way

Johannes Schaub - litb 2009-02-27 18:02:41

Answer 3

+2 A:

It's common to functions to return zero for the common - or one-of-a-kind - case and non-zero for special cases. Take the main function, which conventionally returns zero on success and some nonzero value for failure. The precise non-zero value indicates what went wrong. For example: out of memory, no access rights or something else.

In your case, if the string is equal, then there is no reason why it is equal other than that the strings contain the same characters. But if they are non-equal then either the first can be smaller, or the second can be smaller. Having it return 1 for equal, 0 for smaller and 2 for greater would be somehow strange i think.

You can also think about it in terms of subtraction:

return = s1 - s2

If s1 is "lexicographically" less, then it will give is a negative value.

Johannes Schaub - litb 2009-02-27 16:55:11

I'm pretty sure you need to dereference your 's's...

dmckee 2009-02-27 17:00:33

i meant it symbolical was pseudo code.

Johannes Schaub - litb 2009-02-27 17:01:13

Ah. I withdraw the suggestion.

dmckee 2009-02-27 17:01:56

Answer 4

A:

I guess it is simply for symmetry: -1 if less, 0 if equal, 1 if more.

PhiLho 2009-02-27 16:57:22

Answer 5

+3 A:

The nice thing about an implementation like this is you can say

if(strcmp(<stringA>, <stringB>) > 0)   // Implies stringA > stringB
if(strcmp(<stringA>, <stringB>) == 0)  // Implies stringA == stringB
if(strcmp(<stringA>, <stringB>) < 0)   // Implies stringA < stringB
if(strcmp(<stringA>, <stringB>) >= 0)  // Implies stringA >= stringB
if(strcmp(<stringA>, <stringB>) <= 0)  // Implies stringA <= stringB
if(strcmp(<stringA>, <stringB>) != 0)  // Implies stringA != stringB

Note how the comparison with 0 exactly matches the comparison in the implication.

Daniel LeCheminant 2009-02-27 17:09:29

There's a wonderful/horrible macro from the comp.lang.c FAQ that implements almost exactly this string behavior. `#define StrTest(str1, op, str2) (strcmp(str1, str2) op 0)` With it, you would write `if(StrTest(stringA, ==, stringB))` and friends. I'm on the fence as to whether it's a horrible idea or a wonderful idea.

Chris Lutz 2009-09-02 04:45:42

Answer 6

+2 A:

There's three possible results: string 1 comes before string 2, string 1 comes after string 2, string 1 is the same as string 2. It is important to keep these three results separate; one use of strcmp() is to sort strings. The question is how you want to assign values to these three outcomes, and how to keep things more or less consistent. You might also look at the parameters for qsort() and bsearch(), which require compare functions much like strcmp().

If you wanted a string equality function, it would return nonzero for equal strings and zero for non-equal strings, to go along with C's rules on true and false. This means that there would be no way of distinguishing whether string 1 came before or after string 2. There are multiple true values for an int, or any other C data type you care to name, but only one false.

Therefore, having a useful strcmp() that returned true for string equality would require a lot of changes to the rest of the language, which simply aren't going to happen.

David Thornley 2009-02-27 17:25:53

Answer 7

+1 A:

Another reason strcmp() returns the codes it does is so that it can be used directly in the standard library function qsort(), allowing you to sort an array of strings:

#include <string.h> // for strcmp()
#include <stdlib.h> // for qsort()
#include <stdio.h>

int sort_func(const void *a, const void *b)
{
    const char **s1 = (const char **)a;
    const char **s2 = (const char **)b;
    return strcmp(*s1, *s2);
}

int main(int argc, char **argv)
{
    int i;
    printf("Pre-sort:\n");
    for(i = 1; i < argc; i++)
        printf("Argument %i is %s\n", i, argv[i]);
    qsort((void *)(argv + 1), argc - 1, sizeof(char *), sort_func);
    printf("Post-sort:\n");
    for(i = 1; i < argc; i++)
        printf("Argument %i is %s\n", i, argv[i]);
    return 0;
}

This little sample program sorts its arguments ASCIIbetically (what some would call lexically). Lookie:

$ gcc -o sort sort.c
$ ./sort hi there little fella
Pre-sort:
Argument 1 is hi
Argument 2 is there
Argument 3 is little
Argument 4 is fella
Post-sort:
Argument 1 is fella
Argument 2 is hi
Argument 3 is little
Argument 4 is there

If strcmp() returned 1 (true) for equal strings and 0 (false) for inequal ones, it would be impossible to use it to obtain the degree or direction of inequality (i.e. how different, and which one is bigger) between the two strings, thus making it impossible to use it as a sorting function.

I don't know how familiar you are with C. The above code uses some of C's most confusing concepts - pointer arithmetic, pointer recasting, and function pointers - so if you don't understand some of that code, don't worry, you'll get there in time. Until then, you'll have plenty of fun questions to ask on StackOverflow. ;)

Chris Lutz 2009-09-02 04:58:48

ansaurus

tags:

views:

answers:

Why does strcmp() return 0 when its inputs are equal?

related questions