views:

195

answers:

3

Hi there,

we have a C++ class which basically reads and writes vectors from a binary file. An exemplary read function that loads a single vector into memory looks like this:

int load (const __int64 index, T* values) const {

 int re = _fseeki64(_file, index * _vectorSize + _offsetData, SEEK_SET); 
 assert(re == 0);

 size_t read = fread(values, sizeof(T), _vectorElements, _file);
 assert(read == _vectorElements);

 return 0;}

Out programs are multithreaded with OpenMP and multiple threads access the same file at the same time. To avoid issues because of multiple threads we always cover the function call within an OpenMP critical statement:

#pragma omp critical {
    load(...);
}

I know that the Microsoft Visual C++ runtime contains several functions like _fseek_nolock, _fread_nolock, _fwrite_nolock and so on... For example the _fread_nolock() function is described as

This function is a non-locking version of fread. It is identical to fread except that it is not protected from interference by other threads. It might be faster because it does not incur the overhead of locking out other threads. Use this function only in thread-safe contexts such as single-threaded applications or where the calling scope already handles thread isolation.

Now my question: I understand that the function blocks "re-entrant" calls, so no other thread will enter the function before other threads have returned. However, I do not understand why it is necessary to protect a single function in that way. IMHO all functions that access/modify the file pointer (_file in the code sample) must be protected and therefore be made thread-safe. This requires to build a lock around the whole function block that actuall calls the standard C functions fseek and fread, so I do not see the point of providing such non-blocking functions.

Can someone explain me these locking mechanisms because I suppose our paranoid locking scheme wastes some performance?

Thank you in advance!

+1  A: 

If you use the Microsoft multithreaded C runtime all the functions that need global or static variables will simply work properly (such as printf and fread, don't ask me why they need globals though). However you still can't pass a FILE * structure to a function that writes to it and expect it to be thread safe.

So the microsoft's "thread safe" functions are thread safe only in the sense that they are re-entrant, ie all access to globals and statics is done with a mutex or similar. But not in the sense that you can call two fprintf() at the same time with the same FILE *.

Source: http://msdn.microsoft.com/en-us/library/1bh5ewb2%28VS.71%29.aspx

Andreas Bonini
+1  A: 

For some simple code, the lock within the FILE * is sufficient. Consider a basic logging infrastructure where you want all threads to log via a common FILE *. The internal lock will make sure the FILE * will not be corrupted by multiple threads and since each log line should stand alone, it doesn't matter how the individual calls interleave.

R Samuel Klatchko
Ok, this explains why fwrite() locks out other threads. AFAIR fwrite() moves the file pointer so multiple threads could append logging messages to a file by calling only fwrite() again and again. I still see no reason why there is `_fseek_nolock`. This function always requires a second function to word with the file pointer.
The reason you have _fseek_nolock is if you care about performance and need an atomic sequence of file operations. You surround a sequence of file operations with _lock_file() / _unlock_file() to make the commands atomic w.r.t. to the file. Once you've done that, you can then use _nolock functions to slightly reduce overhead.
R Samuel Klatchko
A: 

If your application is already guranteeing serialized access to file handles, you can get better performance if you tell the c-runtime to bypass it's own serialization. This is the purpose of the _fread_nolock, etc functions.

John Knoeller