tags:

views:

160

answers:

2

I am attempting to compress a string via use of zlib (I've tried this code with current 1.2.3 version of zlib and with zlib 1.1.3). My code works correctly, unless run on a Japanese machine. After compressing a file, I am encrypting it. The decryption is successful, but the call to uncompress returns -3 (Z_DATA_ERROR, meaning the input data was corrupted). As there are no errors being logged, I know that no exceptions are being thrown, and that the compression function is returning 0 (Z_OK, meaning it worked).

Thus, I suspect the problem is that the sCompressed string is losing integrity on either the line "sCompressed = Left(sCompressed, lcompressedlen)" or the line "encryptedData.Content = sCompressed." Alternatively, VB6 might be doing something stupid helpful to the contents of sCompressed during the call to compress. I know that the return value of this function is not being corrupted later, because that would have broken decryption, which works fine.

Public Function EncryptString(ByVal Definition As String) As String
On Error GoTo ErrorHandler
    Dim encryptedData As New CAPICOM.encryptedData
    encryptedData.SetSecret KEY_CONST
    Dim lStringLen As Long
    Dim lcompressedlen As Long
    Dim sCompressed As String
    Dim lReturn As Long
    Dim tstpost As String
    lStringLen = Len(Definition)
    lcompressedlen = (lStringLen * 1.01) + 13
    sCompressed = Space(lcompressedlen)
    lReturn = compress(sCompressed, lcompressedlen, Definition, lStringLen)
    If lReturn <> 0 Then
        sCompressed = "Error: " & CStr(lReturn)
     '<LOG ERROR>'
    Else
        sCompressed = Left(sCompressed, lcompressedlen)
    End If
    encryptedData.Content = sCompressed
    encryptedData.Algorithm.Name = CAPICOM_ENCRYPTION_ALGORITHM_3DES
    EncryptXmlString = encryptedData.Encrypt
Exit Function
ErrorHandler:
    '<LOG ERROR>'
    Resume Next
End Function

Conclusion:
I ended up making the program give an error message and quit if run on a machine with a suspicious character set. It is quite likely that this bug still exists on a few settings and also likely that it does not exist on some of the settings that trigger the error. However, since the target audience is English speakers, passing the Turkey Test is not important enough to justify actually spending more time on this.

+1  A: 

Stop using a String when you mean to pass a Byte array?

Of course you're going to get automatic ANSI conversions and data recopying when you use Strings as you do here.

Bob Riemersma
Using a byte array is annoying. Doing a strconv copy (to convert to ansi) doesn't work perfectly, doing a copymemory includes extraneous nulls.
Brian
If you absolutely must use strings, use StrPtr so the runtime doesn't convert them to ANSI and back (also, buffer lengths must be doubled)
rpetrich
Avoiding the byte array doesn't mean you avoid the ANSI conversion. It often just means VB6 does it *implicitly* - for instance when you call a DLL via a Declare and pass a string.
MarkJ
A: 

Bob is right. These are just my footnotes to his answer. Be warned I'm totally unfamiliar with zlib - I'm assuming you're calling a DLL using a Declare for compress.

Using a String instead of a Byte array doesn't mean you avoid the ANSI conversion. It often just means VB6 does the conversion implicitly and you can't control it - for instance when you call a DLL with a Declare statement and pass a string.

It's possible that the magic sequence of bytes returned from the compression is not a valid "ANSI" string on the Japanese code page. Some character sequences are undefined on the MSDN table for that code page. If you are calling a DLL with a Declare statement and expecting a string to be returned into sCompressed, that DLL had better write a valid "ANSI" string into the corresponding buffer. If it writes an invalid sequence of bytes, anything might happen. You will also have trouble on Chinese (936 and 950) and Korean (949).

What you're describing might well happen: when compress returns the invalid sequence of bytes might be converted into a "Unicode" string without errors being reported - perhaps a truncated Unicode string that matches the first portion of your byte sequence. Then, when you later attempt to decompress, that Unicode string is converted back into an ANSI string, and it doesn't match the original byte sequence you started from. It can't possibly match. There's no possible Unicode string that will convert to an "ANSI" string on code page 932 as a sequence of bytes that isn't a valid string.

Here's some more info on the terrible mishmash that is VB6's implementation of Unicode: a free chapter from Michael Kaplan's excellent book Internationalization With Visual Basic

I also suspect you may be confusing the number of characters in a string with the number of bytes it occupies in ANSI representation (I'm suspicious of lStringLen and lcompressedlen). Again, Japanese is a double-byte character set so the ANSI string may take up to 2*N bytes for N characters.

MarkJ
@MarkJ: I have a dead-tree copy of the Kaplan Book sitting next to me. It's the only book on my desk, actually.
Brian
Excellent! In that case you should be fine.
MarkJ