I've been chasing a very mysterious bug for days now, which seems to revolve around the (ZLIB library); it occurs once every month or so, and only on some specific production environments. Here is what the code does:
- The program calls
gzopento write to a fileX. - The program writes data to the file, doing several
gzwrite. - The program finally calls
gzclosewhich flushes the file.
Usually, everything works fine and the file X is valid; it is properly terminated with the CRC and the length of the source stream.
However, in one of the failures, I observe that X is corrupt: the beginning of the data is correct, but starting at offset 0x00302000, every byte is null. Even the eight last bytes, which encode the CRC and the length, are zero. However, the file has the right size! And what's worse: the same system successfully compressed a very similar file a few minutes earlier.
Note: The ZLIB.DLL we are using has the version 1.1.3; yes, I know, it contains some security holes and we should upgrade to the latest ZLIB 1.2.3, but I don't want to change anything in my setup until I've found the cause of the zeroing.
I think that I've ruled out memory corruption (by the way, how could a corrupt memory heap disturb fwrite sufficiently that it only writes zeroes to the output stream? would that be plausible?), the loop which opens/writes/closes the stream is simple and does not reveal any defects I can spot, the code does not allocate/free/mess with structures in ZLIB (which could be a problem, since ZLIB is linked against another C library than my application DLLs), so I can only suspect other elements in the system.
Somehow, I tend to have confidence in the C library (CRTDLL.DLL), the Win32 API, the NTFS stack, the I/O stack, the low-level device drivers, the firmware of the harddisks and the harddisk themselves... And yes, I also tend to believe that Visual C++ 2008 produces correct binaries, at least in this case ;-)
So, am I right to suspect that the antivirus software could be the culprit? It should be cautious with ZLIB, since at least Kaspersky recognizes the DLL as a possible threat. But would it be politically correct for an antivirus to simply write zeroes instead of the data if an infection is (incorrectly) spotted? Or might this be a bug in the antivirus?
Or do I totally miss the point?