views:

184

answers:

5

I've allocated a chuck of memory with char* memoryChunk = malloc ( 80* sizeof(char) + 1); What is keeping me from writing into the memory location beyond 81 units? What can I do to prevent that?

void testStage2(void) {
 char c_str1[20] = "hello";
 char* ut_str1;
 char* ut_str2;

 printf("Starting stage 2 tests\n");
 strcat(c_str1, " world");
 printf("%s\n", c_str1); // nothing exciting, prints "hello world"

 ut_str1 = utstrdup("hello ");
 ut_str1 = utstrrealloc(ut_str1, 20);
 utstrcat(ut_str1, c_str1);
 printf("%s\n", ut_str1); // slightly more exciting, prints "hello hello world"

 utstrcat(ut_str1, " world");
 printf("%s\n", ut_str1); // exciting, should print "hello hello world wo", 'cause there's not enough room for the second world
}

char* utstrcat(char* s, char* suffix){
 int i = strlen(s),j;
 int capacity = *(s - sizeof(unsigned) - sizeof(int));
 for ( j =0; suffix[j] != '\0'; j++){
  if ((i+j-1) == 20)
   return s;
  s[i+j] = suffix[j];
 }
 //strcpy(s, suffix);
 s[i + j] = '\0';
 return s;
}// append the suffix to s
A: 

What happens: Nothing, or your program will get SIGSEGV thrown at it. What you should do: Write your program carefully. Use tools like valgrind.

p__
"nothing" and SIGSEGV are unfortunately not the only two possibilities. There is no telling what can happen.
Pascal Cuoq
+3  A: 

Nothing is stopping you from doing that. If you do so, anything could happen: the program could continue on its merry way as if nothing happened, it might crash now, it might crash later, it might even erase your hard drive. This is the realm of undefined behavior.

There are a number of tools that try to detect or mitigate these types of problems, but nothing is fool-proof. One such tool is valgrind. valgrind watches your program's pattern of memory accesses and notifies you of problems like this. It does this by running your program in a virtual machine of sorts, so it hurts the performance of your program significantly, but it can help you catch lots of errors when used correctly.

Adam Rosenfield
+11  A: 

What is keeping me from writing into the memory location beyond 81 units?

Nothing. However, doing this results in undefined behaviour. This means anything can happen, and you shouldn't depend on it doing the same thing twice. 99.999% of the time this is a bug.

What can I do to prevent that?

Always check that your pointers are within bounds before accessing (reading from or writing to) them. Always make sure strings end with \0 when passing to string functions.

You can use debugging tools such as valgrind to assist you in locating bugs related to out-of-bounds pointer and array access.

stdlib's approach

For your code, you can have utstrncat which acts like utstrcat but takes a maximum size (i.e. the size of the buffer).

stdc++'s approach

You can also create an array struct/class or use std::string in C++. For example:

typedef struct UtString {
    size_t buffer_size;
    char *buffer;
} UtString;

Then have your functions operate on that instead. You can even have dynamic reallocation using this technique (but that doesn't seem to be what you want).

End-of-buffer marker approach

Another approach is to have an end of buffer marker, similar to the end of string marker. When you encounter the marker, don't write to that place or one before it (for the end of string marker) (or you can reallocate the buffer so there's more room).

For example, if you have "hello world\0xxxxxx\1" as a string (where \0 is the end of string marker, \1 is the end of buffer marker, and the x are random data). appending " this is fun" would look like the following:

hello world\0xxxxxx\1
hello world \0xxxxx\1
hello world t\0xxxx\1
hello world th\0xxx\1
hello world thi\0xx\1
hello world this\0x\1
hello world this \0\1
*STOP WRITING* (next bytes are end of string then end of buffer)

Your problem

The problem with your code is here:

  if ((i+j-1) == 20)
   return s;

Although you are stopping before overrunning the buffer, you are not marking the end of the string.

Instead of returning, you can use break to end the for loop prematurely. This will cause the code after the for loop to run. This sets the end of string marker and returns the string, which is what you want.

In addition, I fear there may be a bug in your allocation. You have + 1 to allocate the size before the string, correct? There's a problem: unsigned is usually not 1 character; you will need + sizeof(unsigned) for that. I would also write utget_buffer_size and utset_buffer_size so you can make changes more easily.

strager
You are underestimating the probability of it being a bug; there should be at least a couple more 9's after the decimal point.
Jonathan Leffler
strager, thank you for trying. I've posted the my own version of strcat. *(s - sizeof(unsigned) - sizeof(int)) because that's where the maximum capacity is stored. I think my approach is somewhat similar to what you have described above. But I kept on getting the some giberrish characters after the second "wor"
Also, where does 'utstrcat()' come from? It doesn't seem to be very standard...Having said that, the question is using it - but I'd still like to know where it comes from.
Jonathan Leffler
@Jon Just posted the utstrcat at the bottom
@Leffler, metashockwave is recreating some standard functions as an exercise. @metashockwave, I've updated my answer to include a solution to your particular problem. (You were not very clear on what what was wrong in your question. Be sure to state exactly the problem next time!)
strager
A: 

Nothing keeps you from writing beyond that bound, and what happens depends on what is beyond that bound. Standard hacker trick (buffer overflow) for hacking programs that don't check and ensure that they do not overwrite buffer limits.

As mentioned by other posters, you just have to program carefully. Don't use calls like strlen, strcpy - use the length-limited versoins like strncpy etc.

Larry Watanabe
A: 

Carl suggests strncpy(), which is a start in the right direction. The main idea is to develop the habit of avoiding buffer overflows by adopting specific practices. A more deliberate library for this is covered in strlcpy and strlcat--Consistent, Safe, String Copy and Concatenation.

Don Wakefield