ansaurus

Question

Answer 1

A:

Usually - these kinds of problems are related to memory corruption. Symptoms such as inconsistent segfaults on different lines whenever you slightly change the code are a wonderful example.

Try running your program through a tool such as valgrind, you are guaranteed to see some illegal memory accesses. Fix those, and I suspect things will work.

Yuval A 2010-10-20 18:37:58

Answer 2

+1 A:

Step one is to try running the function without introducing threads. Just write a .c file that has a main that does the bare minimum to get ready to start the thread, and then rather than do that it just calls the function. It is much easier to debug if you can recreate the problem with just one thread.

Additionally, if you are using gcc you should compile with:

-fstack-protector-all -Wstack-protector -fno-omit-frame-pointer

in addition to your normal flags (at least until you find the problem). These will help with debugging and possibly issue more warnings at compile time. I assume that you know how -O flags can effect debug-ability and functionality (especially if you are already doing something wrong or undefined in the C code).

When you are in GDB and things look like they have locked up or the program is taking a long time to do something you can usually press CTRL Z to get back to (gdb) without killing the program. This issues the stop signal to the program and lets you interact with GDB again, so you can find out what the program is actually doing.

edit

I apparently solved the problem within comments discussion, so I'll write what the problem was here.

A quick glance at the code did not suggest a problem that would result in a segmentation fault (illegal memory access), and Zypsy (the OP) told me that the function ran fine when being called directly from main rather than being run via a separate thread.

Valgrind reported that the thread's stack space was unable to be expanded to a certain address. In Linux the main thread's stack is mapped into the application in such a way that it can easily grow, but this often isn't done when memory is allocated for thread stacks.

I asked Zypsy (the OP) to insert some code that would print out the address of something known to be low on the threads stack (printf("thread stk = %p\n", &input);) so that that value could be compared to the address given in the failure message. From this I could get a guess for the stack size. This did not suggest that very much stack space was consumed between the beginning of the thread function and its failure, but the space also did not seem too small for the code in the question (it apparently turned out to be too small, though).

Because the pthread_create function allows you to either accept the settings for a thread's attributes (pass in a NULL) or pass in an argument specifying various settings for the thread I asked if the code that called pthread_create could be posted so that I could see if there were any suspect settings.

After looking at this code it (an application specific wrapper around various pthread_ functions) I saw that there was actually some stack related attributes being set. I asked the OP to look at calls to this function and look for suspicious things related to how the stack was allocated (make sure that the size value and the allocated memory size were actually the same). It turned out that the OP then found that this thread's stack was being allocated smaller than the stacks of other threads. The stack was too small after all.

nategoose 2010-10-20 18:47:12

No issues when running the function in the main thread. Figuring I had a thread issue, I cleared out the function. It's now.

Zypsy 2010-10-20 20:46:04

Gah... No issues when running the function in the main thread. Figuring I had a thread issue, I cleared out the function to contain only a fprintf(stderr, "Test%d\n", 1); and the return. GDB handles the function now. When run as a separate thread it throws a cryptic "0x001b4e4e in buffered_vfprintf (s=0x2c9580, format=0x8058ef1 "Test\d.\n", args=0xb7ffd308 "\004") at vfprintf.c:2221". I'd go with the guess that stderr isn't safe for use in threads, but the existing logging function which uses mutex exhibits the same behavior. Valgrind gives the same "Can't extend stack" message.

Zypsy 2010-10-20 20:52:24

ansaurus

tags:

views:

answers:

Strange SEGFAULTS using fprintf

edit

related questions