views:

1459

answers:

8

consider the program below

    char str[5];
    strcpy(str,"Hello12345678");
    printf("%s",str);

When run this program gives segmentation fault.

But when strcpy is replaced with following, program runs fine.

strcpy(str,"Hello1234567");

So question is it should crash when trying to copy to str any other string of more than 5 chars length.

So why it is not crashing for "Hello1234567" and only crashing for "Hello12345678" ie of string with length 13 or more than 13.

This program was run on 32 bit machine .

+7  A: 

You're copying to the stack, so it's dependent on what the compiler has placed on the stack, for how much extra data will be required to crash your program.

Some compilers might produce code that will crash with only a single byte over the buffer size - it's undefined what the behaviour is.

I guess size 13 is enough to overwrite the return address, or something similar, which crashes when your function returns. But another compiler or another platform could/will crash with a different length.

Also your program might crash with a different length if it ran for a longer time, if something less important was being overwritten.

Douglas Leeder
+1  A: 

It depends on what's on the stack after the "str" array. You just happen not to be trampling on anything critical until you copy that many characters.

So it's going to depend on what else is in the function, the compiler you use and possibly the compiler options too.

13 is 5 + 8, suggesting there are two non-critical words after the str array, then something critical (maybe the return address)

Paul
Almost certainly it is critical, if not to the running of the program, then to its results. It's actually better when it dumps core since at least then you don't rely on possibly dodgy data from it.
paxdiablo
Yes, I was using "critical" in the narrow sense of causing an immediate crash. Overwriting off the end of the array is never going to be a good idea.
Paul
+1  A: 

That's the pure beauty of undefined behavior (UB): it's undefined.

Your code:

char str[5];
strcpy(str,"Hello12345678");

Writes 14 bytes/chars to str which can only hold 5 bytes/chars. This invokes UB.

dalle
+25  A: 

There are three types of standards behaviour you should be interested in.

1/ Defined behaviour. This will work on all complying implementations. Use this freely.

2/ Implementation-defined behaviour. As stated, it depends on the implementation but at least it's still defined. Implementations are required to document what they do in these cases. Use this if you don't care about portability.

3/ Undefined behaviour. Anything can happen. And we mean anything, up to and including your entire computer collapsing into a naked singularity and swallowing itself, you and a large proportion of your workmates. Never use this. Ever! Seriously! Don't make me come over there.

Copying more that 4 characters and a zero-byte to a char[5] is undefined behaviour.

Seriously, it doesn't matter why your program crashes with 14 characters but not 13, you're almost certainly overwriting some non-crashing information on the stack and your program will most likely produce incorrect results anyway. In fact, the crash is better since at least it stops you relying on the possibly bad effects.

Increase the size of the array to something more suitable (char[14] in this case with the available information) or use some other data structure that can cope.


Update:

Since you seem so concerned with finding out why an extra 7 characters doesn't cause problems but 8 characters does, let's envisage the possible stack layout on entering main(). I say "possible" since the actual layout depends on the calling convention that your compiler uses. Since the C start-up code calls main() with argc and argv, the stack at the start of main(), after allocating space for a char[5], could look like this:

+------------------------------------+
| C start-up code return address (4) |
+------------------------------------+
| argc (4)                           |
+------------------------------------+
| argv (4)                           |
+------------------------------------+
| x = char[5] (5)                    |
+------------------------------------+

When you write the bytes Hello1234567\0 with:

strcpy (x, "Hello1234567");

to x, it overwrites the argc and argv but, on return from main(), that's okay. Specifically Hello populates x, 1234 populates argv and 567\0 populates argc. Provided you don't actually try to use argc and/or argv after that, you'll be okay.

However, if you write Hello12345678\0 (note the extra "8") to x, it overwrites the argc and argv and also one byte of the return address so that, when main() attempts to return to the C start-up code, it goes off into fairy land instead.

Again, this depends entirely on the calling convention of your compiler. It's possible a different compiler would always pad out arrays to a multiple of 4 bytes and the code wouldn't fail there until you wrote another three characters. Even the same compiler may allocate variables on the stack frame differently to ensure alignment is satisfied.

That's what they mean by undefined: you don't know what's going to happen.

paxdiablo
Speaking of undefined behavior and its negative consequences, I like this quote (though I don't know who to attribute it to): "If you dance barefoot on the broken glass of undefined behavior, you've got to expect the occasional cut."
SCFrench
I think all explanations of undefined behavior should be accompanied with an obligatory reference to nasal demons.
Chris Lutz
A very good answer, indeed!
Makis
+2  A: 

To add to the above answers: you can test for bugs like these with a tool such as Valgrind. If you're on Windows, have a look at this SO thread.

Stephan202
A: 

Q: So why it is not crashing for "Hello1234567" and only crashing for "Hello12345678" ie of string with length 13 or more than 13.

wentbackward
+4  A: 

For 32-bit Intel platform the explanation is the following. When you declare char[5] on stack the compiler really allocates 8 bytes because of alignment. Then it's typical for functions to have the following prologue:

push ebp
mov ebp, esp

this saves ebp registry value on stack, then moves esp register value into ebp for using esp value to access the parameters. This leads to 4 more bytes on stack to be occupied with ebp value.

In the epilogue ebp is restored, but its value is usually only used for accessing stack-allocated function parameters, so overwriting it may not hurt in most cases.

So you have the following layout (stack grows downwards on Intel): 8 bytes for your array, then 4 bytes for ebp, then usually the return address.

This is why you need to overwrite at least 13 bytes to crash your program.

sharptooth
A: 

Because the behaviour is undefined. Use strncpy. See this page http://en.wikipedia.org/wiki/Strcpy for more information.

strncpy is unsafe since it doesn't add a NULL termination if the source string has a length >= n where n is the size of the destination string.

char s[5];
strncpy(s,5,"test12345");
printf("%s",s); // crash

We always use strlcpy to alleviate this.

Gayan
Any reason for the down vote?
Gayan