tags:

views:

125

answers:

6

Hello.

I'm learning C.

I find I learn programming well when I try things and received feedback from established programmers in the language.

I decided to write my own strcmp() function, just because I thought I could :)

int strcompare(char *a, char *b) {
    while (*a == *b && *a != '\0') {
        a++;
        b++;
    }
    return *a - *b;
}

I was trying to get it to work by incrementing the pointer in the condition of the while but couldn't figure out how to do the return. I was going for the C style code, of doing as much as possible on one line :)

Can I please get some feedback from established C programmers? Can this code be improved? Do I have any bad habits?

Thanks.

+1  A: 

I don't know since when putting as much as possible is considered as C-style... I rather associate (obfuscated) Perl with that..

Please DO NOT do this. The best thing to do is one command per line. You will understand why when you try to debug your code :)

To your implementation: Seems quite fine to me, but I would put in the condition that *b is not '\0' either, because you can't know that a is always bigger than b... Otherwise you risk reading in unallocated memory...

apirogov
It seemed to still work when `b` had a greater length than `a` (in my tests). Am I relying on undefined behaviour?
alex
As far as I know, yes... After all you can't know how big the buffers are, unless you pass their length in as well and the only way left is to look for \0 in all of the strings, otherwise the results might be garbage...
apirogov
@alex: Yes, you are.
DeadMG
You only need to test for zero in the one string as, if the zero is encountered in the other string it will be a mismatch and stop iterating.
Chris Becke
+2  A: 
  1. The function doesn't change the content of a and b. it should probably announce that by taking pointers to const strings.
  2. Most C styles are much terser than many other languages' styles, but don't try to be too clever. (In your code, with several conditions ANDed in the loop conditions, I don't think there's way to put incrementing in there, so this isn't even a question of style, but of correctness.)
sbi
`strcmp` only has to return greater than, equal to, or less than zero.
Oli Charlesworth
@Oli: Ah, Ok, so I was wrong on that. (I'm a C++ developer by heart.) Fixed.
sbi
Thanks for this - just what I was looking for. Can you please elaborate on 1. I tried when `b` started with `a` (what I think you meant) but it still worked. 3. is a very good point, but I merely forgot to do it :)
alex
Oh and +1 because it is useful information.
alex
@alex: Two of my original four points were moot (one I found out by myself, one was found by Oli), so I deleted them. That must have been moments before you asked that question in your comment. I suppose your reference to #3 refers to what's now #1, the `const` issue. Yes, that's often forgotten, and that's sad. Using `const` strictly allows the compiler to find errors you make, and that's infinitely to be preferred over errors happening at run-time, because whatever error causes compilation to fail, has no chance on ending up on your customers' machines.
sbi
+1  A: 

You may find this interesting, from eglibc-2.11.1. It's not far different to your own implementation.

/* Compare S1 and S2, returning less than, equal to or
   greater than zero if S1 is lexicographically less than,
   equal to or greater than S2.  */
int
strcmp (p1, p2)
     const char *p1;
     const char *p2;
{
  register const unsigned char *s1 = (const unsigned char *) p1;
  register const unsigned char *s2 = (const unsigned char *) p2;
  unsigned reg_char c1, c2;

  do
    {
      c1 = (unsigned char) *s1++;
      c2 = (unsigned char) *s2++;
      if (c1 == '\0')
    return c1 - c2;
    }
  while (c1 == c2);

  return c1 - c2;
}
Matt Joiner
Wow, how ridiculously overcomplicated. This looks like it was written for a compiler from the 80s that had no idea how to optimize. If you're going to do something fancy, at least make it useful (for instance comparing whole words at a time).
R..
+3  A: 

If you want to do everything in the while statement, you could write

while (*a != '\0' && *a++ == *b++) {}

I'm not personally a huge fan of this style of programming - readers need to mentally "unpack" the order of operations anyway, when trying to understand it (and work out if the code is buggy or not). Memory bugs are particularly insidious in C, where overwriting memory one byte beyond or before where you should can cause all sorts of inexplicable crashes or bugs much later on, away from the original cause.

Modern styles of C programming emphasize correctness, consistency, and discipline more than terseness. The terse expression features, like pre- and post-increment operations, were originally a way of getting the compiler to generate better machine code, but optimizers can easily do that themselves these days.

As @sbi writes, I'd prefer const char * arguments instead of plain char * arguments.

Doug
Thanks Doug! I thought about doing that actually. I definitely understand what you mean by having to *unpack* the order of operations. +1
alex
while(*a
Chris Becke
I'm surprised you came up with a version of that loop condition that would work, but even after looking at it for a while I wouldn't bet anything substantial on it working under all conditions. So I wholeheartedly agree with your excellent analysis, `+1` from me.
sbi
A: 

A very subtle bug: strcmp compares bytes interpreted as unsigned char, but your function is interpreting them as char (which is signed on most implementations). This will cause non-ascii characters to sort before ascii instead of after.

R..
A: 

This function will fail if limits of (insigned) char is equal to or greater than limits of int because of integer overflow.

For example if you compile it on DSP which have 16 bit char with limits 0...65536 and 16 bit int with limits of -32768...32767, then if you try to compare strings like "/uA640" and "A" the result will be negative, which is not true.

This is exotic and weird problem but it appear when you write universal implementation.

Vovanium