tags:

views:

132

answers:

3

Hi guys,

I really have a strange situation. I'm making a Linux multi-threaded C application using all the nitty-gritty memory stuff involving char* strings, and I'm stuck in a really odd position.

Basically, what happens is, using POSIX threads, I'm reading and writing to a two-dimensional char array, but it has unusual errors. You have my word that I have done extensive testing on what they are individually accessing, and they don't read another threads' data, let alone write to others. When the last thread that works with the array changes its parts of the array, it seems to change the last few chars of its arrays and put characters in there that I don't know how they could possibly have got in there; mainly ones that print as black diamond question mark things.

I use valgrind and GDB, and they don't really help. As far as I can tell, all should work. Valgrind tells me I'm not freeing everything.

I know all that sounds fairly undescriptive, but here's where it gets weird: if I compile my program with electric fence, then it all works. Valgrind tells me I'm freeing everything and that there's no memory errors at all, just as I thought it should have been. It works absolutely flawlessly!

So, I guess my question is, why does my program work fine when compiled with electric fence?

(And also as a side question, what steps need to be taken to ensure 100% "thread-safe" code?)

+1  A: 

It sounds like you're trashing your data structures. Try putting canaries at the beginning and end of your arrays, open up GDB, then put write breakpoints on the canaries.

A canary is a const value that should never be changed - its only purpose is to detect memory corruption should it be overwritten. For example:

int the_size_i_need;
char* array = malloc((the_size_i_need + 2) * sizeof(char));
array[0] = 0xAA;
array[the_size_i_need+1] = 0xFF;
char* real_array = array+1;

/* Do some stuff here using real_array */

if (array[0] != 0xAA || array[the_size_i_need+1] != 0xFF) {
    printf("Oh noes! We're corrupted\n");
}
Paul Betts
That's pretty much what efence does (although using guard pages). Putting canaries at the boundaries between the bits of the array used by different threads might be fruitful, though, since of course efence can't go there.
Steve Jessop
+1  A: 

Electric fence allocates pages, I've heard at least two, for each allocation you make. It uses the OSs paging mechanisms to check for accessing outside of the allocation. This means that if you want a new 14-character array you end up with a whole new page to hold it, say 8k. Most of the page is unused but you can detect errant accesses by watching which pages get used. I can imagine that on account of having so much extra space if a problem gets past the guards you wouldn't see an error.

If you don't have a bad access but rather corruption due to two threads not locking correctly efence won't detect it. efence likely keep pointers to allocated memory, fooling valgrind into reporting no problems. You should run valgrind with the --show-reachable=yes flag and see what's unclaimed at the end of your run.

Paul Rubel
A: 

Oh god, I'm so sorry. I've worked it out: there was a variable given to the thread for each to put their answer into, but I didn't define it as zero, and it contains 2 funny chars. Maybe the electric fence malloc() allocates 'zeroed' memory like calloc(), but standard malloc() of course doesn't.

panic