views:

1577

answers:

10

I have developed a reverse-string program. I am wondering if there is a better way to do this, and if my code has any potential problems. I am looking to practice some advanced features of C.

char* reverse_string(char *str)
{
char temp;
size_t len = strlen(str) - 1;
size_t i;
size_t k = len;

for(i = 0; i < len; i++)
{
 temp = str[k];
 str[k] = str[i];
 str[i] = temp;
 k--;

     /* As 2 characters are changing place for each cycle of the loop
        only traverse half the array of characters */
     if(k == (len / 2))
     {
         break;
     }
    }
}
+7  A: 

Try this:

reverse_string(NULL);
reverse_string("");
RossFabricant
All of the POSIX functions are undefined for NULL, strlen, strcat etc. It should be the responsibility of the caller to check for NULL, not the function. Garbage in garbage out. Or in this case, garbage in then segfault.
dreamlax
But he does have a point with "".
Michael Burr
"" is [const char*], not [char*] => you cannot reverse constants in place.
avp
+1  A: 

Since you say you want to get fancy, perhaps you'll want to exchange your characters using an XOR swap.

chaos
Terrible idea :\
GMan
You guys are no fun. :)
chaos
Does the xor trick even work for signed chars on every system? 32-bit chars?
Arafangion
It works as long as you are not swapping with the same location. But it's much slower than just using a temp variable on most platforms anyway.
GMan
@Arafangion - XOR swap uses bitwise operations. The end result of an XOR swap is that the two variables swapped will end up with the same bit values they started with, but swapped. Same bit values = same values, signed or unsigned, 8-, 16- or 32-bit. Maybe if you swap a signed and an unsigned, or different sized characters, it won't work - but what will in that case?
Chris Lutz
+7  A: 

You can put your (len/2) test in the for loop:

for(i = 0; i < (len/2); i++)
{
        temp = str[k];
        str[k] = str[i];
        str[i] = temp;
        k--;
    }
}
Michael Burr
Nice one +1. Could also write a SWAP Macro?
Tom
+4  A: 

I don't see a return statement, and you are changing the input string, which may be a problem for the programmer. You may want the input string to be immutable.

Also, this may be picky, but len/2 should be calculated only one time, IMO.

Other than that, it will work, as long as you take care of the problem cases mentioned by rossfabricant.

James Black
+3  A: 

You could change your for loop declaration to make the code shorter:

char* reverse_string(char *str)
{
    char temp;
    size_t len = strlen(str) - 1;
    size_t stop = len/2;
    size_t i,k;

    for(i = 0, k = len; i < stop; i++, k--)
    {
     temp = str[k];
     str[k] = str[i];
     str[i] = temp;
    }
    return str;
}
Ben Straub
@mmw - why do you think it will crash?
Jonathan Leffler
@Ben: why not: for (i = 0, k = len; i < k; i++, k--) ...?
Jonathan Leffler
+1  A: 

Rather than breaking half-way through, you should simply shorten your loop.

size_t length = strlen(str);
size_t i;

for (i = 0; i < (length / 2); i++)
{
    char temp = str[length - i - 1];
    str[length - i - 1] = str[i];
    str[i] = temp;
}
dreamlax
+10  A: 

Just a rearrangement, and safety check. I also removed your non-used return type. I think this is a safe and clean as it gets:

#include <stdio.h>
#include <string.h>

void reverse_string(char *str)
{
    /* skip null */
    if (str == 0)
    {
     return;
    }

    /* skip empty string */
    if (*str == 0)
    {
     return;
    }

    /* get range */
    char *start = str;
    char *end = start + strlen(str) - 1; /* -1 for \0 */
    char temp;

    /* reverse */
    while (end > start)
    {
     /* swap */
     temp = *start;
     *start = *end;
     *end = temp;

     /* move */
     ++start;
     --end;
    }
}


int main(void)
{
    char s1[] = "Reverse me!";
    char s2[] = "abc";
    char s3[] = "ab";
    char s4[] = "a";
    char s5[] = "";

    reverse_string(0);

    reverse_string(s1);
    reverse_string(s2);
    reverse_string(s3);
    reverse_string(s4);
    reverse_string(s5);

    printf("%s\n", s1);
    printf("%s\n", s2);
    printf("%s\n", s3);
    printf("%s\n", s4);
    printf("%s\n", s5);

    return 0;
}

Edited so that end will not point to a possibly bad memory location when strlen is 0.

GMan
I'm going to be a pedantic jerk here. There's something unsafe in this function: the line "char* end = start + strlen(str) - 1;" results in undefined behavior when strlen(str)==0.
Michael Burr
Heh, I thought of that after I posted but I was just gonna hope nobody saw :>Though this isn't undefined, is it? It will point to a bad memory location, which is bad, but it's still well defined, yes?In any case I've updated my post for more safety. Keep telling me if there's more :) I strive for correctness.
GMan
While this is unlikely to be an actual problem in most environments, it's undefined behavior by the standard. The standard allows pointer arithmetic to result in pointers within the array or to one past the last element of the array (in this last case the pointer is not allowed to be dereferenced, of course).
Michael Burr
Ah, ok, good to know, thanks. 'Cept now you made my code uglier :P
GMan
Sorry about uglifying the code... since the strrev() routine is so often used in interviews, this little bit is something I always look for. I don't hold it against anyone (if that's the worst problem the function had, I don't think there's much to worry about), but pointing it out it can spur further conversation which gives additional insight.
Michael Burr
Oh it's fine. :) Look, I found a cleaner way :D :P
GMan
I doubted you @Michael Burr - so I looked it up and you're right - section 6.5.6: "If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined." OTOH, I've never been on a compiler where arbitrary subtraction was an issue.
rampion
Like I said, it's unlikely to be a real problem - the one that I can think of off the top of my head is the good old, obsolete, segmented MS-DOS architecture. If your pointer happened to be at offset 0 of the segment, then this would cause a problem.
Michael Burr
There are plenty of small embedded platforms where you can't count on a nice linear memory model. On a PC this isn't going to be an issue (unless you're running DOS), but your code won't be portable. It's generally best to don't do pointer arithmetics outside the bounds of your array.
jalf
+5  A: 

If you want to practice advanced features of C, how about pointers? We can toss in macros and xor-swap for fun too!

#include <string.h> // for strlen()

// reverse the given null-terminated string in place
void inplace_reverse(char * str)
{
  if (str)
  {
    char * end = str + strlen(str) - 1;

    // swap the values in the two given variables
    // XXX: fails when a and b refer to same memory location
#   define XOR_SWAP(a,b) do\
    {\
      a ^= b;\
      b ^= a;\
      a ^= b;\
    } while (0)

    // walk inwards from both ends of the string, 
    // swapping until we get to the middle
    while (str < end)
    {
      XOR_SWAP(*str, *end);
      str++;
      end--;
    }
#   undef XOR_SWAP
  }
}

A pointer (e.g. char *, read from right-to-left as a pointer to a char) is a data type in C that is used to refer to location in memory of another value. In this case, the location where a char is stored. We can dereference pointers by prefixing them with an *, which gives us the value stored at that location. So the value stored at str is *str.

We can do simple arithmetic with pointers. When we increment (or decrement) a pointer, we simply move it to refer to the next (or previous) memory location for that type of value. Incrementing pointers of different types may move the pointer by a different number of bytes because different values have different byte sizes in C.

Here, we use one pointer to refer to the first unprocessed char of the string (str) and another to refer to the last (end). We swap their values (*str and *end), and move the pointers inwards to the middle of the string. Once str >= end, either they both point to the same char, which means our original string had an odd length (and the middle char doesn't need to be reversed), or we've processed everything.

To do the swapping, I've defined a macro. Macros are text substitution done by the C preprocessor. They are very different from functions, and it's important to know the difference. When you call a function, the function operates on a copy of the values you give it. When you call a macro, it simply does a textual substitution - so the arguments you give it are used directly.

Since I only used the XOR_SWAP macro once, it was probably overkill to define it, but it made more clear what I was doing. After the C preprocessor expands the macro, the while loop looks like this:

    while (str < end)
    {
      do { *str ^= *end; *end ^= *str; *str ^= *end; } while (0);
      str++;
      end--;
    }

Note that the macro arguments show up once for each time they're used in the macro definition. This can be very useful - but can also break your code if used incorrectly. For example, if I had compressed the increment/decrement instructions and the macro call into a single line, like

      XOR_SWAP(*str++, *end--);

Then this would expand to

      do { *str++ ^= *end--; *end-- ^= *str++; *str++ ^= *end--; } while (0);

Which has triple the increment/decrement operations, and doesn't actually do the swap it's supposed to do.

While we're on the subject, you should know what xor (^) means. It's a basic arithmetic operation - like addition, subtraction, multiplication, division, except it's not usually taught in elementary school. It combines two integers bit by bit - like addition, but we don't care about the carry-overs. 1^1 = 0, 1^0 = 1, 0^1 = 0, 0^0 = 0.

A well known trick is to use xor to swap two values. This works because of three basic properties of xor: x ^ 0 = x, x ^ x = 0 and x ^ y = y ^ x for all values x and y. So say we have two variables a and b that are initially storing two values va and vb.

  // initially:
  // a == va
  // b == vb
  a ^= b;
  // now: a == va ^ vb
  b ^= a;
  // now: b == vb ^ (va ^ vb)
  //        == va ^ (vb ^ vb)
  //        == va ^ 0
  //        == va
  a ^= b;
  // now: a == (va ^ vb) ^ va
  //        == (va ^ va) ^ vb
  //        == 0 ^ vb
  //        == vb

So the values are swapped. This does have one bug - when a and b are the same variable:

  // initially:
  // a == va
  a ^= a;
  // now: a == va ^ va
  //        == 0
  a ^= a;
  // now: a == 0 ^ 0
  //        == 0
  a ^= a;
  // now: a == 0 ^ 0
  //        == 0

Since we str < end, this never happens in the above code, so we're okay.

While we're concerned about correctness we should check our edge cases. The if (str) line should make sure we weren't given a NULL pointer for string. What about the empty string ""? Well strlen("") == 0, so we'll initialize end as str - 1, which means that the while (str < end) condition is never true, so we don't do anything. Which is correct.

There's a bunch of C to explore. Have fun with it!

Update: mmw brings up a good point, which is you do have to be slightly careful how you invoke this, as it does operate in-place.

 char stack_string[] = "This string is copied onto the stack.";
 inplace_reverse(stack_string);

This works fine, since stack_string is an array, whose contents are initialized to the given string constant. However

 char * string_literal = "This string is part of the executable.";
 inplace_reverse(string_const);

Will cause your code to flame and die at runtime. That's because string_literal merely points to the string that is stored as part of your executable - which is normally memory that you are not allowed to edit by the OS. In a happier world, your compiler would know this, and cough an error when you tried to compile, telling you that string_literal needs to be of type char const * since you can't modify the contents. However, this is not the world my compiler lives in.

There are some hacks you could try to make sure that some memory is on the stack or in the heap (and is therefore editable), but they're not necessarily portable, and it could be pretty ugly. However, I'm more than happy to throw responsibility for this to the function invoker. I've told them that this function does in place memory manipulation, it's their responsibility to give me an argument that allows that.

rampion
Well, I check for that with the if(str) line - and since I'm reversing in place, I only have to worry about the input.
rampion
Ah. I see what you mean now. You mean I need to make sure that the argument is not for a `char const *` that's part of the binary. This is true of every inplace reversal and should be noted. If you had said `char s[] = "Dante Alighieri est né";`, then that would be fine, since it'd be on the stack. Unfortunately, I don't think there is a portable check for this. In an ideal world, the compiler would have required your example to be `char const * s = "..."` since the RHS is non-modifiable.
rampion
Perhaps you'd care to enlighten us as to why xor swap is actually a good idea? It's generally slower than the "plain" version, it's harder to read and it fails spectacularly if you try to swap an address with itself.
jalf
Both accurate points; however my goal here was not to write the most efficient code, but just to have an excuse to write about various C features and Cisms Xor swap is good to know about, if only to recognize it.
rampion
+3  A: 

This complete program shows how I would do it. Keep in mind I was writing C when most of you whippersnappers were a glint in your mothers eyes so it's old-school, do-the-job, long-var-names-are-for-wimps. Fix that if you wish, I'm more interested in the correctness of the code.

It handles NULLs, empty strings and all string sizes. I haven't tested it with strings of maximum size (max(size_t)) but it should work, and if you're handling strings that big, you're insane anyway :-)

#include <stdio.h>
#include <string.h>

char *revStr (char *str) {
    char tmp, *src, *dst;
    size_t len;
    if (str != NULL)
    {
        len = strlen (src);
        if (len > 1) {
            src = str;
            dst = src + len - 1;
            while (src < dst) {
                tmp = *src;
                *src++ = *dst;
                *dst-- = tmp;
            }
        }
    }
    return str;
}

char *str[] = {"", "a", "ab", "abc", "abcd", "abcde"};

int main(int argc, char *argv[]) {
    int i;
    char s[10000];
    for (i=0; i < sizeof(str)/sizeof(str[0]); i++) {
        strcpy (s, str[i]);
        printf ("'%s' -> '%s'\n", str[i], revStr(s));
    }
    return 0;
}
paxdiablo
You may want to refer to the little conversation in GMan's answer: http://stackoverflow.com/questions/784417/c-reverse-a-string/784455#784455
Michael Burr
Thanks, @MB, although that will never occur in practice, I guess we have to follow the standard. Updated to handle empty strings, and optimized to handle 1-character strings at the same time.
paxdiablo
+1  A: 

Does nobody use pointers anymore?

void inplace_rev( char * s ) {
  char t, *e = s + strlen(s);
  while ( --e > s ) { t = *s;*s++=*e;*e=t; }
}

EDIT: Sorry, just noticed the above XOR example...

Sanjaya R