views:

66

answers:

0

I am migrating all my company's customer data to a new CRM system, and the old legacy database is on a SQL Server 2000 server. I can extract all data in the tables via T-SQL queries but run into a brick wall when it comes to file attachments. They are stored as BLOBs in an Image column in the legacy database.

After several fruitless experiments with BCP.exe and textcopy, I have been able to open an ADODB Stream and dump the results to a file per the VB code below:

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    If Not IsNumeric(TextBox1.Text) Then
        MsgBox("Please enter numbers only. '" & TextBox1.Text & "' is not a number", vbInformation, "Try Again")
    Else
        cn = New ADODB.Connection
        cn.Open("Provider=SQLOLEDB;data Source=XXXXXXXXX;Initial Catalog=Soaring;User Id=XXXXXXXX;Password=XXXXXXXXX")

        rs = New ADODB.Recordset
        rs.Open("Select * from soaring.dbo.ftmdocument WHERE documentid =" & TextBox1.Text, _
                cn, ADODB.CursorTypeEnum.adOpenKeyset, ADODB.LockTypeEnum.adLockOptimistic)
        If IsDBNull(rs.Fields("document").Value) Then
            MsgBox("Document Field Value is Null - No Document.", vbInformation, "No Document")
        Else
            nstream = New ADODB.Stream
            nstream.Type = adTypeBinary
            nstream.Open()
            nstream.Write(rs.Fields("document").Value)
            nstream.Position = 0
            nstream.SaveToFile("c:\file\" & rs.Fields("FileName").Value, 2)
            rs.Close()
            cn.Close()
            MsgBox("Done!", vbInformation, "Done")
        End If
    End If
End Sub

And this works great on half the attachments. The other half seem to have been stored in the database with twice the number of bytes, and with 00 for every other byte. Needless to say, this results in a corrupted file when I extract it using my VB code. The file is also precisely twice bigger than the original file per the byte counts on the files.

Here is are the first few bytes of a file in the database that I'm able to easily extract:

0x255044462D312E330D25E2E3CFD30.......

This extracts to a working PDF file in the file system.

The problem files have the following hex code:

0x25005000440046002D0031002E0033000D002500....

Just about every other byte seems to be zero, but not EVERY other byte is 00, so I can't simply discard the 00 bytes.

I suspect this is due to the problem files being stored in a UTF-16 or UCS-2 character mapping. So how do I go from a 2-byte character mapping to a 1-byte? Will WideCharToMultiByte serve my purpose? Before anyone tells me it cannot be done, there has to be a way to extract these files properly as the current legacy frontend app is doing it successfully... (and of course, the company who wrote the app refuses to provide any kind of guidance on how to do this bulk export...)

Thanks!