views:

1009

answers:

4

I'm looking at some legacy Linux code which uses pthreads.

In one thread a file is read via fgets(). The FILE variable is a global variable shared across all threads. (Hey, I didn't write this...)

In another thread every now and again the FILE is closed and reopened with another filename.

For several seconds after this has happened, the thread fgets() acts as if it is continuing to read the last record it read from the previous file: almost as if there was an error but fgets() was not returning NULL. Then it sorts itself out and starts reading from the new file.

The code looks a bit like this (snipped for brevity so I hope it's still intelligible):

In one thread:

while(gRunState != S_EXIT){
  nanosleep(&timer_delay,0);
  flag = fgets(buff, sizeof(buff), gFile);
  if (flag != NULL){
    // do something with buff...
  }
}

In the other thread:

fclose(gFile);
gFile = fopen(newFileName,"r");

There's no lock to make sure that the fgets() is not called at the same time as the fclose()/fopen().

Any thoughts as to failure modes which might cause fgets() to fail but not return NULL?

+1  A: 

Umm, you really need to control access to the FILE stream with a mutex, at the minimum. You aren't looking at some clever implementation of lock free methods, you are looking at really bad (and dusty) code.

Using thread local FILE streams is the obvious and most elegant fix, just use locks appropriately to ensure no two threads operate on the same offset of the same file at once. Or, more simply, ensure that threads block (or do other work) while waiting for the file lock to clear. POSIX advisory locks would be best for this, or your dealing with dynamically growing a tree of mutexes... or initializing a file lock mutex per thread and making each thread check the other's lock (yuck!) (since files can be re-named).

I think you are staring down the barrel of some major fixes .. unfortunately (from what you have indicated) there is no choice but to make them. In this case, its actually easier to debug a threaded program written in this manner than it would be to debug something using forks, consider yourself lucky :)

Tim Post
+2  A: 

A FILE * is just a pointer to the various resources. If the fclose does not zero out those resource, it's possible that the values may make enough sense that fgets does not immediately notice it.

That said, until you add some locking, I would consider this code completely broken.

R Samuel Klatchko
I checked, and most fclose() implementations do _not_ zero out the structure prior to freeing it. However, his code needs locks, or file pointers in thread local storage. Global file streams are useful for something like logging, where the file pointer is never renamed (or carefully renamed, if re-loading a config, etc).
Tim Post
A: 

You can also put some condition-wait (pthread_cond_wait) instead of just some nanosleep which will get signaled when intended e.g. when a new file gets fopened.

vinit dhatrak
+2  A: 

How the described code goes wrong

The stdio library buffers data, allocating memory to store the buffered data. The GNU C library dynamically allocates file structures (some libraries, notably on Solaris, use pointers to statically allocated file structures, but the buffer is still dynamically allocated unless you set the buffering otherwise).

If your thread works with a copy of a pointer to the global file pointer (because you passed the file pointer to the function as an argument), then it is conceivable that the code would continue to access the data structure that was orginally allocated (even though it was freed by the close), and would read data from the buffer that was already present. It would only be when you exit the function, or read beyond the contents of the buffer, that things start going wrong - or the space that was previously allocated to the file structure is reallocated for a new use.

FILE *global_fp;

void somefunc(FILE *fp, ...)
{
    ...
    while (fgets(buffer, sizeof(buffer), fp) != 0)
        ...
}

void another_function(...)
{
    ...
    /* Pass global file pointer by value */
    somefunc(global_fp, ...);
    ...
}


Proof of Concept Code

Tested on MacOS X 10.5.8 (Leopard) with GCC 4.0.1:

#include <stdio.h>
#include <stdlib.h>

FILE *global_fp;
const char etc_passwd[] = "/etc/passwd";

static void error(const char *fmt, const char *str)
{
    fprintf(stderr, fmt, str);
    exit(1);
}

static void abuse(FILE *fp, const char *filename)
{
    char buffer1[1024];
    char buffer2[1024];
    if (fgets(buffer1, sizeof(buffer1), fp) == 0)
     error("Failed to read buffer1 from %s\n", filename);
    printf("buffer1: %s", buffer1);

    /* Dangerous!!! */
    fclose(global_fp);
    if ((global_fp = fopen(etc_passwd, "r")) == 0)
     error("Failed to open file %s\n", etc_passwd);

    if (fgets(buffer2, sizeof(buffer2), fp) == 0)
     error("Failed to read buffer2 from %s\n", filename);
    printf("buffer2: %s", buffer2);
}

int main(int argc, char **argv)
{
    if (argc != 2)
     error("Usage: %s file\n", argv[0]);

    if ((global_fp = fopen(argv[1], "r")) == 0)
     error("Failed to open file %s\n", argv[1]);

    abuse(global_fp, argv[1]);

    return(0);
}

When run on its own source code, the output was:

Osiris JL: ./xx xx.c
buffer1: #include <stdio.h>
buffer2: ##
Osiris JL:

So, empirical proof that on some systems, the scenario I outlined can occur.

How to fix the code

The fix to the code is discussed well in other answers. If you avoid the problem I illustrated (for example, by avoiding global file pointers), that is simplest. Assuming that is not possible, it may be sufficient to compile with the appropriate flags (on many Unix-like systems, the compiler flag '-D_REENTRANT' does the job), and you will end up using thread-safe versions of the basic standard I/O functions. Failing that, you may need to put explicit thread-safe management policies around the access to the file pointers; a mutex or something similar (and modify the code to ensure that the threads use the mutex before using the corresponding file pointer).

Jonathan Leffler
This is interesting, I checked a few implementations of fclose() to see if they zeroed out the structure before freeing it, and they don't. At least not (my current versions) of glibc or FBSD. Speed at the expense of aggravation, gotta love it :)
Tim Post
Why buffer2 is printed as `##` instead of `#include <stdio.h>` ??
vinit dhatrak
I think such assumption is very dangerous, the man page of fclose clearly mentions that any further access (including another call to `fclose()`) to the stream results in undefined behaviour. Correct me if I am wrong here.
vinit dhatrak
@Vinit Dhatrak: good question, but... Because I'm indulging in undefined behaviour - at best - the result is not clearly defined. I'm not sure exactly why, but dissecting what happened would require diagnostic printing of the internals of the FILE pointer - not platform neutral. My best guess is that the freed space was reused for something, and I ended up with garbage. However, the key point is that the (MacOS) library does not detect that a previously valid file pointer is now invalid. It does demonstrate the danger of aliassing.
Jonathan Leffler
@Vinit Dhatrak - again: yes, it is undefined behaviour. But it also simulates (with no threads) a plausible variation of the code described in the threaded scenario - where a global file pointer is changed but if the previous value was passed into a function, it could continue to 'work' (coincidentally) despite the change to the global variable. As you say - and I said - horrible design and undefined behaviour. But the stdio library did not detect the error.
Jonathan Leffler
@Jonathan Leffler so we both are saying the same thing, and therefore this can not be possible solution for the question. This code needs to be fixed with either proper lock or `condition_wait/signal`.
vinit dhatrak
@Vinit Dhatrak: yes - I am not proposing a solution (that involves thread-safe locking, and avoiding aliassing); I am illustrating how the existing code goes wrong.
Jonathan Leffler
Thanks for this: a very clear example and illustrates a failure mode I had not considered.This isn't quite the way the code I've inherited is failing though. The global variable is literally just accessed as a global variable without ever being passed to a function. The original developer clearly used the "just declare everything as a global variable" design pattern. Used together with the "spawn lots of threads with no synchronisation" pattern. It looks as if I've got some serious refactoring to do...
Simon Elliott