tags:

views:

154

answers:

4

I'm trying to learn C programming and spent some time practicing with pointers this morning, by writing a little function to replace the lowercase characters in a string to their uppercase counterparts. This is what I got:

#include <stdio.h>
#include <string.h>

char *to_upper(char *src);

int main(void) {
    char *a = "hello world";
    printf("String at %p is \"%s\"\n", a, a);
    printf("Uppercase becomes \"%s\"\n", to_upper(a));
    printf("Uppercase becomes \"%s\"\n", to_upper(a));
    return 0;
}

char *to_upper(char *src) {
    char *dest;
    int i;
    for (i=0;i<strlen(src);i++) {
        if ( 71 < *(src + i) && 123 > *(src + i)){
            *(dest+i) = *(src + i) ^ 32;
        } else {
            *(dest+i) = *(src + i);
        }
    }
    return dest;
}

This runs fine and prints exactly what it should (including the repetition of the "HELLO WORLD" line), but afterwards ends in a Segmentation fault. What I can't understand is that the function is clearly compiling, executing and returning successfully, and the flow in main continues. So is the Segmentation fault happening at return 0?

+17  A: 

dest is uninitialised in your to_upper() function. So, you're overwriting some random part of memory when you do that, and evidently that causes your program to crash as you try to return from main().

If you want to modify the value in place, initialise dest:

char *dest = src;

If you want to make a copy of the value, try:

char *dest = strdup(src);

If you do this, you will need to make sure somebody calls free() on the pointer returned by to_upper() (unless you don't care about memory leaks).

Greg Hewgill
Moreover, char *a = "hello world";is deprecated, it should be const char *a = "hello world";; however, doing this will disable the in-place edit, so you'll have to do char a[] = "hello world";.
Matteo Italia
That's true, but that's not what would cause the reported behaviour.
Greg Hewgill
Note that `strdup()` is not part of the standard C library; it's a POSIX extension, and as such is not universally supported. The same functionality may be replicated by the following statements: `char *dest = malloc(strlen(src)+1); if (dest) strcpy(dest, src);`
John Bode
+1  A: 

As noted by others, your problem is that char *dest is uninitialized. You can modify src's memory in place, as Greg Hewgill suggests, or you can use malloc to reserve some:

char *dest = (char *)malloc(strlen(src) + 1);

Note that the use of strdup suggested by Greg performs this call to malloc under the covers. The '+ 1' is to reserve space for the null terminator, '\0', which you should also be copying from src to dest. (Your current example only goes up to strlen, which does not include the null terminator.) Can I suggest that you add a line like this after your loop?

*(dest + i) = 0;

This will correctly terminate the string. Note that this only applies if you choose to go the malloc route. Modifying the memory in place or using strdup will take care of this problem for you. I'm just pointing it out because you mentioned you were trying to learn.

Hope this helps.

Sean Devlin
+2  A: 

Like everyone else has pointed out, the problem is that dest hasn't been initialized and is pointing to a random location that contains something important. You have several choices of how to deal with this:

  1. Allocate the dest buffer dynamically and return that pointer value, which the caller is responsible for freeing;
  2. Assign dest to point to src and modify the value in place (in which case you'll have to change the declaration of a in main() from char *a = "hello world"; to char a[] = "hello world";, otherwise you'll be trying to modify the contents of a string literal, which is undefined);
  3. Pass the destination buffer as a separate argument.

Option 1 -- allocate the target buffer dynamically:

char *to_upper(char *src)
{
  char *dest = malloc(strlen(src) + 1);
  ...
}

Option 2 -- have dest point to src and modify the string in place:

int main(void)
{
  char a[] = "hello world";
  ...
}
char *to_upper(char *src)
{
  char *dest = src;
  ...
}

Option 3 -- have main() pass the target buffer as an argument:

int main(void)
{
  char *a = "hello world";
  char *b = malloc(strlen(a) + 1); // or char b[12];
  ...
  printf("Uppercase becomes %s\n", to_upper(a,b));
  ...
  free(b); // omit if b is statically allocated
  return 0;
}
char *to_upper(char *src, char *dest)
{
  ...
  return dest;
}

Of the three, I prefer the third option; you're not modifying the input (so it doesn't matter whether a is an array of char or a pointer to a string literal) and you're not splitting memory management responsibilities between functions (i.e., main() is solely responsible for allocating and freeing the destination buffer).

I realize you're trying to familiarize yourself with how pointers work and some other low-level details, but bear in mind that a[i] is easier to read and follow than *(a+i). Also, there are number of functions in the standard library such as islower() and toupper() that don't rely on specific encodings (such as ASCII):

#include <ctype.h>
...
if (islower(src[i])
  dest[i] = toupper(src[i]);
John Bode
+2  A: 

As others have said, your problem is not allocating enough space for dest. There is another, more subtle problem with your code.

To convert to uppercase, you are testing a given char to see if it lies between 71 ans 123, and if it does, you xor the value with 32. This assumes ASCII encoding of characters. ASCII is the most widely used encoding, but it is not the only one.

It is better to write code that works for every type of encoding. If we were sure that 'a', 'b', ..., 'z', and 'A', 'B', ..., 'Z', are contiguous, then we could calculate the offset from the lowercase letters to the uppercase ones and use that to change case:

/* WARNING: WRONG CODE */
if (c >= 'a' && c <= 'z') c = c + 'A' - 'a';

But unfortunately, there is no such guarantee given by the C standard. In fact EBCDIC encoding is an example.

So, to convert to uppercase, you can either do it the easy way:

#include <ctype.h>
int d = toupper(c);

or, roll your own:

/* Untested, modifies it in-place */
char *to_upper(char *src)
{
    static const char *lower = "abcdefghijklmnopqrstuvwxyz";
    static const char *upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    static size_t n = strlen(lower);
    size_t i;
    size_t m = strlen(src);

    for (i=0; i < m; ++i) {
        char *tmp;
        while ((tmp = strchr(lower, src[i])) != NULL) {
            src[i] = upper[tmp-lower];
        }
    }
}

The advantage of toupper() is that it checks the current locale to convert characters to upper case. This may make æ to Æ for example, which is usually the correct thing to do. Note: I use only English and Hindi characters myself, so I could be wrong about my particular example!

Alok
Congratulations for the detailed explanation of an often-missed point!
bortzmeyer