views:

499

answers:

2

I've been chasing a very mysterious bug for days now, which seems to revolve around the (ZLIB library); it occurs once every month or so, and only on some specific production environments. Here is what the code does:

  1. The program calls gzopen to write to a file X.
  2. The program writes data to the file, doing several gzwrite.
  3. The program finally calls gzclose which flushes the file.

Usually, everything works fine and the file X is valid; it is properly terminated with the CRC and the length of the source stream.

However, in one of the failures, I observe that X is corrupt: the beginning of the data is correct, but starting at offset 0x00302000, every byte is null. Even the eight last bytes, which encode the CRC and the length, are zero. However, the file has the right size! And what's worse: the same system successfully compressed a very similar file a few minutes earlier.

Note: The ZLIB.DLL we are using has the version 1.1.3; yes, I know, it contains some security holes and we should upgrade to the latest ZLIB 1.2.3, but I don't want to change anything in my setup until I've found the cause of the zeroing.

I think that I've ruled out memory corruption (by the way, how could a corrupt memory heap disturb fwrite sufficiently that it only writes zeroes to the output stream? would that be plausible?), the loop which opens/writes/closes the stream is simple and does not reveal any defects I can spot, the code does not allocate/free/mess with structures in ZLIB (which could be a problem, since ZLIB is linked against another C library than my application DLLs), so I can only suspect other elements in the system.

Somehow, I tend to have confidence in the C library (CRTDLL.DLL), the Win32 API, the NTFS stack, the I/O stack, the low-level device drivers, the firmware of the harddisks and the harddisk themselves... And yes, I also tend to believe that Visual C++ 2008 produces correct binaries, at least in this case ;-)

So, am I right to suspect that the antivirus software could be the culprit? It should be cautious with ZLIB, since at least Kaspersky recognizes the DLL as a possible threat. But would it be politically correct for an antivirus to simply write zeroes instead of the data if an infection is (incorrectly) spotted? Or might this be a bug in the antivirus?

Or do I totally miss the point?

+1  A: 

Without knowing anything else, I would suspect your code, rather than the anti-virus code.

It would require a simple error to write the correct length of data to the file, but passing in an incorrect buffer, resulting in "all zeroes" as data in the file.

Simple errors that are hard to spot sometimes lead to very big confusion, like you are having.

If I were troubleshooting this I would:

  • simplify simplify simplify, until the problem stopped. then add complexity back in, stepwise.
  • check your pointers and pointer arithmetic
  • check that you are flushing & closing the file properly
  • use an allocator that stores marker bytes in the allocated buffers, like 0xAA rather than 0x00. This way you can see if it is your (zeroed) buffer that is being written to the file.
  • step through it all in a debugger

In normal circumstances, GZIP'd data will never have a looong succession of zeros. These will be collapsed into a dictionary and length code.

Cheeso
Than you for the tips. I thing I double-checked everything, but you are right, I should replace the memory allocator to make sure, then way a few weeks to see if the problem re-surfaces in production... I've also noticed that a simple file copy (fopen A/fopen B/fread A/fwrite B/fclose A/fclose B) generates a file B with proper size but only zeroes in it; but this happens only once the above described problem happens.
Pierre
A: 

Did you ever resolve this? We are fighting the same problem in our code and even making a test program it seams to be corrupt in memory. The compress2 call returns the correct size and 0 for the result for the compressed buffer but immediately calling the uncompress on the compressed string gives us the -3 Invalid Data error.

No, the issue is still open. We have tried upgrading to the latest version of ZLIB, which uses a more recent C library; the problem has not reappeared, however it appears only very, very rarely, so I don't know if this fixed the bug or not.
Pierre
What version did you use and by chance can you email me your dll file you created? I tried rebuilding 1.2.3 which was the latest I could find with Visual Studio 2005 and still have problem. CompressString on a simple string like "test" returns 0 but the DecompressString on the results returns -3 so it's not in writing it to the disk. Works fine on thousands of our users but sure enough a potential new client it doesn't work on. You can email me at [email protected] also. Thanks