views:

495

answers:

1

I need to convert a binary file (a zip file) into hexadecimal representation, to then send it to sql-server as a varbinary(max) function parameter.

A full example (using a very small file!) is:

1) my file contains the following bits 0000111100001111

2) I need a procedure to QUICKLY convert it to 0F0F

3) I will call a sql server function passing 0x0F0F as parameter

The problem is that I have large files (up to 100MB, even if average file size is 100KB files are possible), so I need the fastest way to do this.

Otherwise stated: I need to create the string

'0x'+BinaryDataInHexadecimalRepresentation

in the most efficient way. (Note: may be there is a way to immediately open a file and obtain an hexadecimal string, so in this case all I need is to use "this way", if it is there).

Related question: passing hexadecimal data to sql server

UPDATE: after reading the comments I think it is needed to add more information here. The reason why I try to use a T-SQL text command to send the binary data to the stored procedure is that in this way I remove some overhead to the server: the stored prcoedure recieves the binary data and writes it to a file (this is my final goal). If I use a DAC component I will be able to easily send the biray data to the server, but in that case I need to use a temp table to store the data, and then sending this data to the storedprocedure that writes the file.

So the idea is:

1) using T-SQL "long" command: more overhead on client because I need to read the file and convert it to hexadecimal to preparing the long command; less server overhead since sql server just recieves the binary data and processes it in the stored function

2) using DAC: I need to pass through a temp table in sql server, therefore having more overhead on the server

Since I am using the server as web document server (it's a trick), I want to try to reduce the overhead on the server. Anyway may be I am wrong and (2) anyway is a better technique than (1)

+1  A: 

Well here's a option that would do the conversion as fast as I can think of.

Features of the code:

  • Only one allocation for the string (so no realloc and no move or copy)
  • Fast read from the file.

Since we know that one byte translates to exactly two hexadecimal chars, we know our result string needs to be exactly twice the size of the file. We allocate an string of the required size and then we read from the file in large-enough blocks so the OS can optimize it for us (reading byte-by-byte is evil). We use an actual string but we write into the string using an pointer:

function TEditorDeschidereDeCredite.FileToHex(FileName: string): AnsiString;
var FS:TFileStream;
    PS:PAnsiChar;
    Block:array[0..1023] of byte; // 1Kb
    len,i,pos:Integer;
    B:Byte;
const Nibbs: array[0..15] of AnsiChar = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F');
begin
  FS := TFileStream.Create(FileName, fmOpenRead);
  try
    Result := '';
    SetLength(Result, FS.Size * 2);
    PS := PAnsiChar(Result);
    pos := 0; // position into the result string
    len := FS.Read(Block, SizeOf(Block));
    while len <> 0 do
    begin
      for i:=0 to len-1 do
      begin
        B := Block[i];
        PS[pos] := Nibbs[B div $F];
        Inc(pos);
        PS[pos] := Nibbs[B mod $F];
        Inc(pos);
      end;
      len := FS.Read(Block, SizeOf(Block));
    end;
  finally FS.Free;
  end;
end;

P.S: I'm using AnsiString, and PAnsiChar so the code works also works with Unicode Delphi. If you happen to be on Delphi 2010 find a way to use this in it's current form (AnsiString) so you can skip the conversions.

Cosmin Prund
Thanks for this that does the conversion work as I desired, I modified it for D2009 as you suggested. Anyway this still doesn't make me certain that this is the most efficient way for doing FileToHex in delphi, but it could be.
This is going to take two comments, one comment is too short. Sorry. I wasn't suggesting any changes, I said I used AnsiString and not String so it also works effectively on Delphi 2009 and Delphi 2010. On those platforms the default String type is actually UTF16 encoded Unicode, so it uses 2 bytes per char! Ansi string is the ancient 1 byte char, so there's no overhead. In Unicode Delphi's notation, one byte converts to two chars when translated to hex, and that ocupies 4 bytes! So your 100Mb file takes up 400Mb in RAM.
Cosmin Prund
... what I was suggesting was to find a way to keep it as AnsiString ALL THE WAY to the stored procedure in the DB! If at any point you do a conversion to UTF16 then you pay the penalty for the 2 bytes chars. This is why using text to represent non-text data is such a problem: Just imagine your 100Mb file converted to a 400Mb UTF16 string, then converted to UTF8 (200Mb) on the way to the SQL server and converted back to UTF16 (400Mb) if the stored procedure wants unicode text. I don't actually know how text is passed to SQL on the wire, this just an scare-scenario.
Cosmin Prund
ok thanks, I blindly replaced ansistring with string, but in fact it was better to leave as it is (as you commented). I got this now. Anyway I am in the domain of tricks, it is not so bad for me because tipically people read the files doesn't write them, anyway I need to test both this approach than the temp table. Temp table has the advantage of moving only 100MB on the network, not 200 or 400. At least, in principle. I don't really know what happens "underground". I will also need to ask for a clr stored procedure that writes to a file.
If you already have the code that works with text give it try... If you're happy with the performance then noone cares how you made it work. But if you don't have the code I'd say try anything and everything before sending 200mb chunks of text to the server! To be honest even 1 mb seems like a lot of text.
Cosmin Prund