I've been chasing a very mysterious bug for days now, which seems to revolve around the (ZLIB library); it occurs once every month or so, and only on some specific production environments. Here is what the code does:
- The program calls
gzopen
to write to a fileX
. - The program writes data to the file, doing several
gzwrite
. - The program finally calls
gzclose
which flushes the file.
Usually, everything works fine and the file X
is valid; it is properly terminated with the CRC and the length of the source stream.
However, in one of the failures, I observe that X
is corrupt: the beginning of the data is correct, but starting at offset 0x00302000
, every byte is null. Even the eight last bytes, which encode the CRC and the length, are zero. However, the file has the right size! And what's worse: the same system successfully compressed a very similar file a few minutes earlier.
Note: The ZLIB.DLL we are using has the version 1.1.3; yes, I know, it contains some security holes and we should upgrade to the latest ZLIB 1.2.3, but I don't want to change anything in my setup until I've found the cause of the zeroing.
I think that I've ruled out memory corruption (by the way, how could a corrupt memory heap disturb fwrite
sufficiently that it only writes zeroes to the output stream? would that be plausible?), the loop which opens/writes/closes the stream is simple and does not reveal any defects I can spot, the code does not allocate/free/mess with structures in ZLIB (which could be a problem, since ZLIB is linked against another C library than my application DLLs), so I can only suspect other elements in the system.
Somehow, I tend to have confidence in the C library (CRTDLL.DLL), the Win32 API, the NTFS stack, the I/O stack, the low-level device drivers, the firmware of the harddisks and the harddisk themselves... And yes, I also tend to believe that Visual C++ 2008 produces correct binaries, at least in this case ;-)
So, am I right to suspect that the antivirus software could be the culprit? It should be cautious with ZLIB, since at least Kaspersky recognizes the DLL as a possible threat. But would it be politically correct for an antivirus to simply write zeroes instead of the data if an infection is (incorrectly) spotted? Or might this be a bug in the antivirus?
Or do I totally miss the point?