I am confused by strcmp(), or rather, how it is defined by the standard. Consider comparing two strings where one contains characters outside the ASCII-7 range (0-127).
The C standard defines:
int strcmp(const char *s1, const char *s2);
The strcmp function compares the string pointed to by s1 to the string pointed to by s2.
The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.
The parameters are char *
. Not unsigned char *
. There is no notion that "comparison should be done as unsigned
".
But all the standard libraries I checked consider the "high" character to be just that, higher in value than the ASCII-7 characters.
I understand this is useful and the expected behaviour. I don't want to say the existing implementations are wrong or something. I just want to know, which part in the standard specs have I missed?
int strcmp_default( const char * s1, const char * s2 )
{
while ( ( *s1 ) && ( *s1 == *s2 ) )
{
++s1;
++s2;
}
return ( *s1 - *s2 );
}
int strcmp_unsigned( const char * s1, const char *s2 )
{
unsigned char * p1 = (unsigned char *)s1;
unsigned char * p2 = (unsigned char *)s2;
while ( ( *p1 ) && ( *p1 == *p2 ) )
{
++p1;
++p2;
}
return ( *p1 - *p2 );
}
#include <stdio.h>
#include <string.h>
int main()
{
char x1[] = "abc";
char x2[] = "abü";
printf( "%d\n", strcmp_default( x1, x2 ) );
printf( "%d\n", strcmp_unsigned( x1, x2 ) );
printf( "%d\n", strcmp( x1, x2 ) );
return 0;
}
Output is:
103
-153
-153