ansaurus

Question

string doesn't end at NULL but still behaves normally, why?

Answer 1

+12 A:

It just happens that there's a null byte right beyond the end of allocated block.

Most likely malloc() allocates more memory and puts so-called guard values that happen to contain null bytes or it puts some metadata to be used by free() later and this metadata happens to contain a null byte right at that position.

Anyway you should not rely on this behaviour. You have to request (malloc()) one more byte for the null character so that the null character location is also legally allocated to you.

There's no portable way to test if a string is null-terminated properly. It can happen that once you're past the end of allocated block your program will just crash. Or it can happen that there is a null character somewhere beyond the end of block and you overwrite memory beyond the end of block later when manipulating the misinterpreted string.

Ideally you need some function that would check if a given address is allocated to you and belongs to the same allocation as another given address (perhaps start of the block). This would be slow and not worth it and there's no standard way for doing this.

In other words, if you encounter a string which is meant to be null-terminated but really isn't you're screwed big time - your program will run into undefined behaviour.

sharptooth 2009-09-23 07:33:34

No, there is not.

Remy Lebeau - TeamB 2009-09-23 07:46:42

yes it happens to be a null byte at the end of string. If you try different sizes you'll get *bad* output.

Nick D 2009-09-23 07:51:26

So there is no standard way to check whether a string is null terminated or not. That is a bad news. I think for that all programmers working on the application must agree on some standard. Like first three character of a pointer will tell its size and from fourth the actual string will start.

Andrew-Dufresne 2009-09-23 08:05:22

@Andrew: So what happens if those three bytes at the start of the string are wrong? Then too, your program will crash and burn. The point here is that if the data structures are inconsistent, you will have problems, and it is very hard (read "very hard" as "logically impossible") to do something about that.

Thomas Padron-McCarthy 2009-09-23 08:11:10

@Andrew: There is an agreement. It states that the string contains an extra byte that holds a null terminator.

sharptooth 2009-09-23 08:15:42

Answer 2

+4 A:

Why does it work?

The memory you allocate happens to have a '\0' byte at the right place. (For example, if you're using Visual C++ in Debug mode, the heap manager zeros allocated memory before it hands it out to your program. But it could just as well be pure luck.)

Is there a proper way to check whether a string ends at '\0' or not?

No. You need your strings to be either zero-terminated (which is what C std lib string handling functions expect) or you need to carry around their length in an extra variable. If you have neither of the two, you have a bug.

Now how will we know that some string from some function developed by some other programmer ends at correct place with '\0'. May be it doesn't, then it will go beyond the actual size until we get some '\0'. We can never know the actual size of the string.

So how do we tackle such situation?

You can't. If the other function screws it that bad, you're screwed that bad.

sbi 2009-09-23 08:02:12

About the heap manager zeroing memory: the Microsoft compiler doesn't zero memory (n debug or release builds). When using the debug heap the MSVC runtime will fill the allocated memory with 0xCD bytes, not zero. Filling with 'garbage' rather than clearing the memory is usually more effective at finding problems. Also, some portion of memory before and after the alloation will be filled with 0xFD values. See http://stackoverflow.com/questions/370195/when-and-why-will-an-os-initialise-memory-to-0xcd-0xdd-etc-on-malloc-free-new/370362#370362

Michael Burr 2009-09-23 16:18:22

@Micheal: For all I know you might be right. But still, ISTR having read again and again that variables not being zeroed is a typical cause for release versions crashing while debug versions work in VC. `<scratches head>`

sbi 2009-09-24 18:37:15

Answer 3

A:

Sharptooth has explained the probable cause of the behaviour, so I'm not gonna repeat that.

When allocating buffers, I always over-allocate by a byte, like this:

#define SIZE 10
char* buf = malloc(sizeof(char)*(SIZE+1));
/* error-check the malloc call here */
buf[SIZE] = '\0';

gnud 2009-09-23 08:02:34

Eh, "sizeof(char)-(SIZE+1)"? Minus?

Thomas Padron-McCarthy 2009-09-23 08:07:12

we can also do thismemet( dest, 0, SIZE );strncpy( dest, source, SIZE -1 );This way last byte will have a zero.

Andrew-Dufresne 2009-09-23 08:11:21

That should be * - times. New keyboard :)

gnud 2009-09-23 08:11:23

Yes, I often use memset. But I find the direct assignment clearer. And if you use strncpy(), it pads any unused space with 0s anyway.

gnud 2009-09-23 08:12:49

While this isn't a bad idea per se, a much simpler solution is to NEVER use the unbounded versions of string functions. Instead of manually assuring null termination, only use functions which take in an upper bound and guarantee the return of a valid C string (It's kind of a shame strncpy doesn't offer this gaurantee). There are usually more secure versions of strncpy, such as strlcpy on the BSDs and strncpy_s on Windows.

Falaina 2009-09-23 08:28:14

Combined with over-allocating one char and setting it to NULL, strncopy does guarantee this. Which is why I do it.

gnud 2009-09-23 09:46:12

Answer 4

+4 A:

As for your edit, I think being pedantic will help elucidate some issues.

In C there is no such thing as a string. There is a concept of a "C string" which is what the C standard library works with which is defined as nothing more than a NUL terminated sequence of characters, so there really isn't such a thing as a "non-null terminated string" in C. So your question is better phrased as "How can I determine if an arbitrary character buffer is a valid C string?" or "How can I determine if the string I found is the intended string"

The answer to the first question,unfortunately,is to just to linearly scan the buffer until you encounter a NUL byte as you are doing. This will give you the length of the C string.

The second question has no easy answer. Due to the fact that C doesn't have an actual string type with length metadata (or the ability to carry around the size of arrays across function calls), there's no real way to determine if the string length we determined above is the length of the intended string. It might be obvious if we start seeing segfaults in the program or "garbage" in the output, but in general we're stuck doing string operations by scanning until the first NUL byte (usually with an upperbound on string length so as to avoid messy buffer overrun errors)

Falaina 2009-09-23 08:09:49

Answer 5

A:

You're lucky to have zero beyond allocated region of space.

Try this code on all another platforms and you'll see it might not behave the same way.

HMage 2009-09-23 08:30:40

Answer 6

A:

I think sharptooth's answer is right. There are more space allocated. I modify the program as follow:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 10

int main()
{
    char *str ;
    int *p;
    int actual_length;
    str = malloc( sizeof( char ) * SIZE );
    if( str == NULL ) 
        exit( 1 );

    actual_length = (int)*(str - 4) - 1 - 4;
    printf("actual length of str is %d\n", actual_length);
    p = (int*) malloc(sizeof(int));
    if (p == NULL) exit(1);
    *p = -1;
    char* pc = (char*)(p - 1);
    pc [0] = 'z';
    pc [1] = 'z';
    pc [2] = 'z';
    pc [3] = 'z';

    memset( str, 0, sizeof( char ) * SIZE );

    memcpy( str, "abcdefghijklmnopqrstuvwxyz", sizeof( char ) * SIZE );

    int i;
    for (i = SIZE; i < actual_length; i++)
     str[i] = 'y';

    unsigned int index;
    for( index = 0; str[ index ] != '\0' ; index++ ) {
        printf( "str[ %u ] has got : %c \n ", index, str[ index ] );
    }

    return 0;
}

The output is

actual length of str is 12
str[ 0 ] has got : a 
 str[ 1 ] has got : b 
 str[ 2 ] has got : c 
 str[ 3 ] has got : d 
 str[ 4 ] has got : e 
 str[ 5 ] has got : f 
 str[ 6 ] has got : g 
 str[ 7 ] has got : h 
 str[ 8 ] has got : i 
 str[ 9 ] has got : j 
 str[ 10 ] has got : y 
 str[ 11 ] has got : y 
 str[ 12 ] has got : z 
 str[ 13 ] has got : z 
 str[ 14 ] has got : z 
 str[ 15 ] has got : z 
 str[ 16 ] has got : \377 
 str[ 17 ] has got : \377 
 str[ 18 ] has got : \377 
 str[ 19 ] has got : \377

My OS is Debian Squeeze/sid.

Cook Schelling 2009-09-23 16:09:42

ansaurus

tags:

views:

answers:

string doesn't end at NULL but still behaves normally, why?

related questions