views:

310

answers:

4

Recently I ran into a "fun" problem with the Microsoft implementation of the CRTL. tmpfile places temp files in the root directory and completely ignores the temp file directory. This has issues with users who do not have privileges to the root directory (say, on our cluster). Moreover, using _tempnam would require the application to remember to delete the temporary files, which it is unable to do without a considerable amount of rework.

Therefore I bit the bullet and wrote Win32 versions of all of the IO routines (create_temp, read, write, seek, flush) which call the appropriate method. One thing I've noticed is the now abysmal performance of the library.

Results from the test suite:

CRTL:    4:30.05 elapsed
Win32:  11:18.06 elapsed

Stats measured in my routines:
Writes:  3129934 (   44,642,745,008 bytes)
Reads:    935903 (    8,183,423,744 bytes)
Seeks:   2205757 (2,043,782,657,968 bytes traveled)
Flushes:   92442

Example of a CRTL v. Win32 method:

int io_write(FILE_POINTER fp, size_t words, const void *buffer)
{
#if !defined(USE_WIN32_IO)
    {
        size_t words_written = 0;

        /* read the data */
        words_written = fwrite(buffer, sizeof(uint32_t), words, fp);
        if (words_written != words)
        {
            return errno;
        }
    }
#else /* !defined(USE_WIN32_IO) */
    {
        DWORD bytesWritten;

        if (!WriteFile(fp, buffer, words * sizeof(uint32_t), &bytesWritten, NULL)
            || (bytesWritten != words * sizeof(uint32_t)))
        {
            return GetLastError();
        }
    }
#endif /* USE_WIN32_IO */

    return E_SUCCESS;
}

As you can see, they are effectively identical, yet the performance (in release mode) is wildly divergent. Time spent in WriteFile and SetFilePointer dwarf the time spent in fwrite and fseeko, which seems counterintuitive.

Ideas?

UPDATE: perfmon notes that fflush is about 10x cheaper than FlushFileBuffers and fwrite is ~1.1x slower than WriteFile. The net result is a huge performance loss with FlushFileBuffers used in the same manner as fflush. There is no change from FILE_ATTRIBUTE_NORMAL to FILE_FLAG_RANDOM_ACCESS either.

+2  A: 

Traditionally, the C runtime library functions buffer the data and only trigger the write operation (hence the need for functions like fflush). I don't think that WriteFile buffers the write operation so every time you call WriteFile, an I/O operation gets triggered whereas with fwrite, the I/O gets triggered when the buffer has reached a certain size.

As you can see from your measurements, the buffered I/O tends to be more efficient...

Timo Geusch
From my reading of `CreateFile` and its options, `WriteFile` should buffer its output. There are attributes that control if an IOP is generated for each call (like FILE_FLAG_NO_BUFFERING or FILE_FLAG_WRITE_THROUGH).
sixlettervariables
I was finally able to get VS's profiler to run (it was generating 65G datasets which it couldn't analyze) and discovered that when using CRTL methods, `fflush` takes a fraction of the time `FlushFileBuffers` does, and `fwrite` takes a smidgen more than `WriteFile`. However, the difference in flush performance dwarfs the write performance.
sixlettervariables
I think Emerick's answer has a good handle on what the reason for that is. Is there a chance that you can just flush buffers rarely or not even at all? If you feel you have to flush buffers, maybe flush them just before you close the file?
Timo Geusch
I wish, unfortunately much of the behavior is out of my control.
sixlettervariables
+1  A: 

I think it's probably due to this issue, described on MSDN's page for FlushFileBuffers:

Due to disk caching interactions within the system, the FlushFileBuffers function can be inefficient when used after every write to a disk drive device when many writes are being performed separately. If an application is performing multiple writes to disk and also needs to ensure critical data is written to persistent media, the application should use unbuffered I/O instead of frequently calling FlushFileBuffers. To open a file for unbuffered I/O, call the CreateFile function with the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags. This prevents the file contents from being cached and flushes the metadata to disk with each write. For more information, see CreateFile.

In general, FlushFileBuffers is an "expensive" operation, since it flushes everything in the write-back cache:

FlushFileBuffers(): This function will flush everything in the write-back cache, as it does not know what part of the cache belongs to your file. This can take a lot of time, depending on the cache size and the speed of the media. How necessary is it? There is a thread which goes through and writes out dirty pages, so it is likely not very necessary.

I presume that fflush does not flush the entire write-back cache. In that case, it's much more efficient, but that efficiency comes at the risk of potential data loss. The CRT's source code for fflush confirms this, since _commit calls FlushFileBuffers:

/* lowio commit to ensure data is written to disk */
if (str->_flag & _IOCOMMIT) {
        return (_commit(_fileno(str)) ? EOF : 0);
}

From the implementation of _commit:

if ( !FlushFileBuffers((HANDLE)_get_osfhandle(filedes)) ) {
        retval = GetLastError();
}
Emerick Rogul
FILE_FLAG_NO_BUFFERING has pretty onerous requirements which cannot be easily met by this application, or in the time required for me to rewrite. I guess I am wondering how in the world the CRTL methods are able to do this efficiently but using the raw Win32 I am not...
sixlettervariables
Probably because they don't immediately commit your changes to disk in the default case. If you always call `_commit` after `fflush` in your unit tests, are the times closer?
Emerick Rogul
Yep, they're almost identical, what this means is the code i've inherited doesn't quite do what it says it does with the original CRTL code, so perhaps I can accept the performance decrease for now. http://support.microsoft.com/kb/66052
sixlettervariables
Right. The Win32 version is taking the safest approach possible, while the CRT implementation is erring on the side of efficiency. I guess it's up to you to decide which approach is appropriate in your particular situation... :-)
Emerick Rogul
The maddening part is less the performance change but the fact that none of the CRTL implementations match up with their *NIX counterparts making maintenance of this code a nightmare. If only I could adopt APR or something similar...
sixlettervariables
+1  A: 

I'm still a little unclear on what the question is. You start out by talking about managing the lifetime of a temporary file and then jump to wrapping an entire file i/o interface. Are you asking about how to manage a temporary file without the performance penalty of wrapping all the file I/O? Or are you interested in how the CRT functions can be faster than the WinAPI functions they are built on top of?

Several of the comparisons being made between the C run-time functions and the WinAPi functions are of the apples and oranges variety.

The C run-time functions buffer the I/O in library memory. There is another layer of buffering (and caching) in the OS.

fflush flushes the data from the library buffers to the OS. It may go directly to disk, or it may go to OS buffers for later writing. FlushFileBuffers gets data from the OS buffers onto the disk, which generally takes longer than moving data from the library buffers to the OS buffers.

Unaligned writes are expensive. The OS buffers make unaligned writes possible, but they don't really speed up the process. The library buffers may accept several writes before pushing data to the OS, effectively reducing the number of unaligned writes to the disk.

It's also possible (though this is just a guess) that the library routines are taking advantage of overlapped (asynchronous) I/O to the disk, where your straight-to-WinAPI implementation is all synchronous.

Adrian McCarthy
The backstory was to handle questions of the nature "why not stick with CRTL".
sixlettervariables
+2  A: 

I might be crazy, but wouldn't it be easier to just write a replacement for tmpfile that uses fopen(temporaryname, "rwTD"), where you generate your own temporaryname?

At least then you don't have to worry about reimplementing <file.h>.

MSN
I gather T is an extension mode?
sixlettervariables
I meant to say D. And for MSVC, yes; D means FILE_FLAG_DELETE_ON_CLOSE, T means FILE_ATTRIBUTE_TEMPORARY.
MSN
+1, but I'm going to keep Emerick's as the answer as he covers the question about performance. Since I'm tasked with changing the bare minimum, I'm going with your suggestion for the actual problem fix.
sixlettervariables
Yes, well, I was about to suggest reading the source for fread, fwrite, and fflush, but I figured that it would be more helpful to just suggest replacing tmpfile. You should read the source for them anyways :)
MSN
I'm in somewhat of a precarious position if I go reading their source, so I try to avoid it if at all possible. Thanks for the help, btw the modes I used are "wbTD+", rw is an invalid combo and it is a binary file.
sixlettervariables