tags:

views:

341

answers:

6

Hi

In the following code, I copy a string in to a char* str, which is 10 characters long, using strncpy().

Now according to strncpy() manual, "Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null terminated. " which is exactly what happens here.

The source string is 26 charcters long and I have copied 10 characters, hence no null character is placed at then end of str.

But when I print the contents of str, starting from 0 until I get '\0', it behaves normally.

Why? When there is no '\0' placed at the end then why does the loop stop at the correct place?

What I understand is that it should give "Segmentation fault" or at least it shouldn't stop there and keep printing some garbage values.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 10

int main()
{
    char *str ;
    str = malloc( sizeof( char ) * SIZE );
    if( str == NULL ) 
     exit( 1 );
    memset( str, 0, sizeof( char ) * SIZE );

    strncpy( str, "abcdefghijklmnopqrstuvwxyz", sizeof( char ) * SIZE );

    unsigned int index;
    for( index = 0; str[ index ] != '\0' ; index++ ) {
     printf( "str[ %u ] has got : %c \n ", index, str[ index ] );
    }

    return 0;
}

Here is the output :

 str[ 0 ] has got : a
 str[ 1 ] has got : b
 str[ 2 ] has got : c
 str[ 3 ] has got : d
 str[ 4 ] has got : e
 str[ 5 ] has got : f
 str[ 6 ] has got : g
 str[ 7 ] has got : h
 str[ 8 ] has got : i
 str[ 9 ] has got : j

Any help will be appreciated.

EDIT

Is there a proper way to check whether a string ends at '\0' or not? I always thought the above loop to be the ultimate test, but now it seems it isn't.

Lets say we get a string from some function developed by other programmer. Now how will we know that it ends at correct place with '\0'. May be it doesn't, then it will go beyond the actual size until we get some '\0'. We can never know the actual size of the string.

So how do we tackle such situation?

Any suggestion?

+12  A: 

It just happens that there's a null byte right beyond the end of allocated block.

Most likely malloc() allocates more memory and puts so-called guard values that happen to contain null bytes or it puts some metadata to be used by free() later and this metadata happens to contain a null byte right at that position.

Anyway you should not rely on this behaviour. You have to request (malloc()) one more byte for the null character so that the null character location is also legally allocated to you.

There's no portable way to test if a string is null-terminated properly. It can happen that once you're past the end of allocated block your program will just crash. Or it can happen that there is a null character somewhere beyond the end of block and you overwrite memory beyond the end of block later when manipulating the misinterpreted string.

Ideally you need some function that would check if a given address is allocated to you and belongs to the same allocation as another given address (perhaps start of the block). This would be slow and not worth it and there's no standard way for doing this.

In other words, if you encounter a string which is meant to be null-terminated but really isn't you're screwed big time - your program will run into undefined behaviour.

sharptooth
No, there is not.
Remy Lebeau - TeamB
yes it happens to be a null byte at the end of string. If you try different sizes you'll get *bad* output.
Nick D
So there is no standard way to check whether a string is null terminated or not. That is a bad news. I think for that all programmers working on the application must agree on some standard. Like first three character of a pointer will tell its size and from fourth the actual string will start.
Andrew-Dufresne
@Andrew: So what happens if those three bytes at the start of the string are wrong? Then too, your program will crash and burn. The point here is that if the data structures are inconsistent, you will have problems, and it is very hard (read "very hard" as "logically impossible") to do something about that.
Thomas Padron-McCarthy
@Andrew: There is an agreement. It states that the string contains an extra byte that holds a null terminator.
sharptooth
+4  A: 

Why does it work?

The memory you allocate happens to have a '\0' byte at the right place. (For example, if you're using Visual C++ in Debug mode, the heap manager zeros allocated memory before it hands it out to your program. But it could just as well be pure luck.)

Is there a proper way to check whether a string ends at '\0' or not?

No. You need your strings to be either zero-terminated (which is what C std lib string handling functions expect) or you need to carry around their length in an extra variable. If you have neither of the two, you have a bug.

Now how will we know that some string from some function developed by some other programmer ends at correct place with '\0'. May be it doesn't, then it will go beyond the actual size until we get some '\0'. We can never know the actual size of the string.

So how do we tackle such situation?

You can't. If the other function screws it that bad, you're screwed that bad.

sbi
About the heap manager zeroing memory: the Microsoft compiler doesn't zero memory (n debug or release builds). When using the debug heap the MSVC runtime will fill the allocated memory with 0xCD bytes, not zero. Filling with 'garbage' rather than clearing the memory is usually more effective at finding problems. Also, some portion of memory before and after the alloation will be filled with 0xFD values. See http://stackoverflow.com/questions/370195/when-and-why-will-an-os-initialise-memory-to-0xcd-0xdd-etc-on-malloc-free-new/370362#370362
Michael Burr
@Micheal: For all I know you might be right. But still, ISTR having read again and again that variables not being zeroed is a typical cause for release versions crashing while debug versions work in VC. `<scratches head>`
sbi
A: 

Sharptooth has explained the probable cause of the behaviour, so I'm not gonna repeat that.

When allocating buffers, I always over-allocate by a byte, like this:

#define SIZE 10
char* buf = malloc(sizeof(char)*(SIZE+1));
/* error-check the malloc call here */
buf[SIZE] = '\0';
gnud
Eh, "sizeof(char)-(SIZE+1)"? Minus?
Thomas Padron-McCarthy
we can also do thismemet( dest, 0, SIZE );strncpy( dest, source, SIZE -1 );This way last byte will have a zero.
Andrew-Dufresne
That should be * - times. New keyboard :)
gnud
Yes, I often use memset. But I find the direct assignment clearer. And if you use strncpy(), it pads any unused space with 0s anyway.
gnud
While this isn't a bad idea per se, a much simpler solution is to NEVER use the unbounded versions of string functions. Instead of manually assuring null termination, only use functions which take in an upper bound and guarantee the return of a valid C string (It's kind of a shame strncpy doesn't offer this gaurantee). There are usually more secure versions of strncpy, such as strlcpy on the BSDs and strncpy_s on Windows.
Falaina
Combined with over-allocating one char and setting it to NULL, strncopy does guarantee this. Which is why I do it.
gnud
+4  A: 

As for your edit, I think being pedantic will help elucidate some issues.

In C there is no such thing as a string. There is a concept of a "C string" which is what the C standard library works with which is defined as nothing more than a NUL terminated sequence of characters, so there really isn't such a thing as a "non-null terminated string" in C. So your question is better phrased as "How can I determine if an arbitrary character buffer is a valid C string?" or "How can I determine if the string I found is the intended string"

The answer to the first question,unfortunately,is to just to linearly scan the buffer until you encounter a NUL byte as you are doing. This will give you the length of the C string.

The second question has no easy answer. Due to the fact that C doesn't have an actual string type with length metadata (or the ability to carry around the size of arrays across function calls), there's no real way to determine if the string length we determined above is the length of the intended string. It might be obvious if we start seeing segfaults in the program or "garbage" in the output, but in general we're stuck doing string operations by scanning until the first NUL byte (usually with an upperbound on string length so as to avoid messy buffer overrun errors)

Falaina
A: 

You're lucky to have zero beyond allocated region of space.

Try this code on all another platforms and you'll see it might not behave the same way.

HMage
A: 

I think sharptooth's answer is right. There are more space allocated. I modify the program as follow:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 10

int main()
{
    char *str ;
    int *p;
    int actual_length;
    str = malloc( sizeof( char ) * SIZE );
    if( str == NULL ) 
        exit( 1 );

    actual_length = (int)*(str - 4) - 1 - 4;
    printf("actual length of str is %d\n", actual_length);
    p = (int*) malloc(sizeof(int));
    if (p == NULL) exit(1);
    *p = -1;
    char* pc = (char*)(p - 1);
    pc [0] = 'z';
    pc [1] = 'z';
    pc [2] = 'z';
    pc [3] = 'z';

    memset( str, 0, sizeof( char ) * SIZE );

    memcpy( str, "abcdefghijklmnopqrstuvwxyz", sizeof( char ) * SIZE );

    int i;
    for (i = SIZE; i < actual_length; i++)
     str[i] = 'y';

    unsigned int index;
    for( index = 0; str[ index ] != '\0' ; index++ ) {
        printf( "str[ %u ] has got : %c \n ", index, str[ index ] );
    }

    return 0;
}

The output is

actual length of str is 12
str[ 0 ] has got : a 
 str[ 1 ] has got : b 
 str[ 2 ] has got : c 
 str[ 3 ] has got : d 
 str[ 4 ] has got : e 
 str[ 5 ] has got : f 
 str[ 6 ] has got : g 
 str[ 7 ] has got : h 
 str[ 8 ] has got : i 
 str[ 9 ] has got : j 
 str[ 10 ] has got : y 
 str[ 11 ] has got : y 
 str[ 12 ] has got : z 
 str[ 13 ] has got : z 
 str[ 14 ] has got : z 
 str[ 15 ] has got : z 
 str[ 16 ] has got : \377 
 str[ 17 ] has got : \377 
 str[ 18 ] has got : \377 
 str[ 19 ] has got : \377

My OS is Debian Squeeze/sid.

Cook Schelling