tags:

views:

238

answers:

8

Hi again. I've come to bother you all with another probably really simple C question.

Using the following code:

int get_len(char *string){

    printf("len: %lu\n", strlen(string));

    return 0;
}

int main(){

    char *x = "test";
    char y[4] = {'t','e','s','t'};

    get_len(x); // len: 4
    get_len(y); // len: 6

    return 0;
}

2 questions. Why are they different and why is y 6? Thanks guys.

EDIT: Sorry, I know what would fix it, I kind of just wanted to understand what was going on. So does strlen just keep forwarding the point till it happens to find a \0? Also when I did strlen in the main function instead of in the get_len function both were 4. Was that just a coincidence?

+2  A: 

You need to null-terminate y.

int get_len(char *string){

    printf("len: %lu\n", strlen(string));

    return 0;
}

int main(){

    char *x = "test";
    char y[5] = {'t','e','s','t','\0'};

    get_len(x); // len: 4
    get_len(y); // len: 4

    return 0;
}

strlen() basically takes the pointer you give it and counts the number of bytes until the next NULL in memory. It just so happened that there was a NULL two bytes later in your memory.

Cory Walker
So is strlen just forwarding the pointer until it finds some \0 anywhere it can?
LearningC
Two problems with this. First, NULL is intended as a null pointer, not a null character. Second, you've still got `char y[4]`, so you've got an extra initializer. You need five positions to have a "test" string.
David Thornley
@LearningC: In a word, yes. That's a good description.
David Thornley
Sorry, forgot to increment the index, but I did not know about the NULL thing. Learn something new every day.
Cory Walker
@Cory Walker: That's why we're here, to learn from each other.
David Thornley
+10  A: 

y is not null-terminated. strlen() counts characters until it hits a null character. Yours happened to find one after 6, but it could be any number. Try this:

char y[] = {'t','e','s','t', '\0'};

Here's what an implementation of strlen() might look like (off the top of my head -- don't have my K&R book handy, but I believe there's an implementation given there):

size_t strlen(const char* s)
{
    size_t result = 0;
    while (*s++) ++result;
    return result;
}
Fred Larson
So is strlen just forwarding the pointer until it finds some \0 anywhere it can?
LearningC
@LearningC: Exactly. It keeps incrementing the pointer and looking at what's there. When it finds a zero, it stops and returns how many characters it looked at (excluding the zero).
Fred Larson
@LearningC: Yes. Or until it crashes with a segfault instead. Or until it formats your hard drive. The behavior is *undefined* if your input is not zero-terminated. *Anything* can happen.
AndreyT
AndreyT is correct. In practice, it'll often find a null before anything bad happens. But you can never be sure. I doubt it would ever actually format your hard drive, but you could sure go off into forbidden memory and have a segmentation fault.
Fred Larson
@Fred: Actually, something bad happens the very moment it steps over the byte where the null should have been. Since (assuming 8bit integers) the chance of a null byte being there accidentally is not all that much better than 1/256, in practice something bad happens quite often.
sbi
@Fred before the days of protected memory, this could actually damage your hardware. Some things were mapped into memory and if a value was read from certain addresses then it altered the state of the device, possibly putting it into an invalid state..
Earlz
You have been of huge help guys, thank you all :)
LearningC
@sbi: If you mean getting the wrong result is "something bad", you're right. Good point. My meaning was something even worse than that (crash, etc.) @Earlz: Sure, but I think that's rather uncommon these days.
Fred Larson
Question: If you were to right a better string copy how could you be sure the string passed to be copied had a \0. Since you couldn't find the length of that "string" you couldn't ask if string[lastchar] == '\0'.
LearningC
@LearningC: it would be a requirement on the caller of your function that the string be terminated, just as it is a requirement on the caller of `strlen` or `strcpy` that the input is a nul-terminated string.
Steve Jessop
+4  A: 

This

char y[4] = {'t','e','s','t'};

is not a proper zero-terminated string. It's an array of four characters, without the terminating '\0'. strlen() simply counts the characters until it hits a zero. With y it simply counts over the end of the array until it accidentally finds a zero byte.
Doing this you are invoking undefined behavior. The code might just as well format your hard drive.

You can avoid this by using the special syntax for character array initialization:

char y[] = "test";

This initializes y with five characters, since it automatically appends a '\0'.
Note that I also left the array's size unspecified. The compiler figures this out itself, and it automatically re-figures if I change the string's length.

BTW, here's a simple strlen() implementation:

size_t strlen(const char* p)
{
    size_t result = 0;
    while(*p++) ++result;
    return result;
}

Modern implementations will likely not fetch individual bytes or even use CPU intrinsics, but this is the basic algorithm.

sbi
Wow, I didn't see your `strlen()` before I added mine. They're identical! Great minds... 8v)
Fred Larson
The GNU libc source for the string functions are quite enlightening to poke around in, and surprisingly complex.
James Morris
sbi
+3  A: 

The following is not a null terminated array of characters:

 char y[4] = {'t','e','s','t'};

Part of strlen()'s contract is that it be provided with a pointer to a null terminated string. Since that doesn't happen with strlen(y), you get undefined behavior. In your particular case, you get 6 returned, but anything could happen, including a program crash.

From C99's 7.1.1 "Definition of terms":

A string is a contiguous sequence of characters terminated by and including the first null character.

Michael Burr
A: 
char y[5] = {'t','e','s','t','\0'};

would be the same as

char *x = "test"; 
stacker
For the purposes of `strlen()`, yes. However, `y` is an array of five `char`, and its contents can be modified at will. `x` is a pointer to `char`, and in this case is pointing to a string that cannot reliably be modified. On the other hand, it's possible to reassign a value to `x` but not to `y`.
David Thornley
@David Thornley the answer was given in a beginner context just to illustrate what went wrong according to the OP. Now better answers were given. No need to downvote
stacker
@stacker: IME exactly differences as subtle as this one (`x[0]='x';` is fine when you said `char x[]="X";`, but fatal when you say `char* x="X";`) can never be stressed too much, especially for beginners.
sbi
+3  A: 

strlen works with strings. String is defined as a sequence (array) of characters terminated with \0 character.

Your x points to a string. So, strlen works fine with x as an argument.

Your y is not a string. For this reason, passing y to strlen results in undefined behavior. The result is meaningless and unpredictable.

AndreyT
+1  A: 

An actual C-type string is one bigger than the number of its characters, since it needs a terminating null character.

Therefore, char y[4] = {'t','e','s','t'}; doesn't form a string, since it's four characters. char y[] = "test"; or char y[5] = "test"; would form a string, since they'd have a character array of five characters ending with the null-byte terminator.

David Thornley
A: 

As others have said, you just need to make sure to end a string with the 0 or '\0' character. As a side note, you may check this out: http://bstring.sourceforge.net/ . It has O(1) string length function, unlike the C/C++ strlen which is error prone and slow at O(N), where N is the number of non-null characters. I don't remember the last time when I used strlen and it's friends. Go for safe & fast functions/classes!

Viet