views:

2692

answers:

4

Hi, I'm using Delphi7 (non-unicode VCL), I need to store lots of WideStrings inside a TFileStream. I can't use TStringStream as the (wide)strings are mixed with binary data, the format is projected to speed up loading and writing the data ... However I believe that current way I'm loading/writing the strings might be a bottleneck of my code ...

currently I'm writing length of a string, then writing it char by char ... while loading, first I'm loading the length, then loading char by char ...

So, what is the fastest way to save and load WideString to TFileStream?

Thanks in advance

+4  A: 

Rather than read and write one character at a time, read and write them all at once:

procedure WriteWideString(const ws: WideString; stream: TStream);
var
  nChars: LongInt;
begin
  nChars := Length(ws);
  stream.WriteBuffer(nChars, SizeOf(nChars);
  if nChars > 0 then
    stream.WriteBuffer(ws[1], nChars * SizeOf(ws[1]));
end;

function ReadWideString(stream: TStream): WideString;
var
  nChars: LongInt;
begin
  stream.ReadBuffer(nChars, SizeOf(nChars));
  SetLength(Result, nChars);
  if nChars > 0 then
    stream.ReadBuffer(Result[1], nChars * SizeOf(Result[1]));
end;

Now, technically, since WideString is a Windows BSTR, it can contain an odd number of bytes. The Length function reads the number of bytes and divides by two, so it's possible (although not likely) that the code above will cut off the last byte. You could use this code instead:

procedure WriteWideString(const ws: WideString; stream: TStream);
var
  nBytes: LongInt;
begin
  nBytes := SysStringByteLen(Pointer(ws));
  stream.WriteBuffer(nBytes, SizeOf(nBytes));
  if nBytes > 0 then
    stream.WriteBuffer(Pointer(ws)^, nBytes);
end;

function ReadWideString(stream: TStream): WideString;
var
  nBytes: LongInt;
  buffer: PAnsiChar;
begin
  stream.ReadBuffer(nBytes, SizeOf(nBytes));
  if nBytes > 0 then begin
    GetMem(buffer, nBytes);
    try
      stream.ReadBuffer(buffer^, nBytes);
      Result := SysAllocStringByteLen(buffer, nBytes)
    finally
      FreeMem(buffer);
    end;
  end else
    Result := '';
end;

Inspired by Mghie's answer, have replaced my Read and Write calls with ReadBuffer and WriteBuffer. The latter will raise exceptions if they are unable to read or write the requested number of bytes.

Rob Kennedy
Your second `WriteWideString()` version does not compile (missing typecast to PWideChar, missing paren), but more importantly it fails for empty strings. Your second `ReadWideString()` should also check for length 0 and simply return an empty string in that case.
mghie
I see no reason it wouldn't work for empty strings; `SysStringByteLen` returns zero for null pointers. The requirement to type-cast to `PWideChar` is because either `SysStringByteLen` is misdeclared to take a `PWideChar` instead of `WideString`, or `BSTR` is misdeclared to be `PWideChar` instead of `WideString`. Nonetheless, I've fixed that and addressed your other concerns, too. Thanks.
Rob Kennedy
I *did* see a reason it might fail for strings having just one byte, though. With range checking enabled, the expression `ws[1]` should fail in that case. (Delphi QC bug 9425 and Free Pascal bug 0010013 affect whether it fails on any particular version.)
Rob Kennedy
I tried your code with an empty string, in Delphi 4 and Delphi 2009, and on both a negative (error code) value was returned. This is on Windows XP 64.
mghie
Try type-casting to `Pointer` instead, and feel free to edit and fix this answer if that works. I don't have Delphi handy to test it myself. I'm just going off what MSDN says.
Rob Kennedy
Good catch. `SysStringByteLen` is indeed declared to take a `PWideChar`, and the type cast feeds it a pointer to an empty wide string with a bogus length - it should be 0 but isn't. That's the negative value I saw in my tests. Using a pointer cast works as expected, and I edited accordingly. Thanks for the comment.
mghie
Missing closing bracket on stream.WriteBuffer(nChars, SizeOf(nChars);
badbod99
+3  A: 

There is nothing special about wide strings, to read and write them as fast as possible you need to read and write as much as possible in one go:

procedure TForm1.Button1Click(Sender: TObject);
var
  Str: TStream;
  W, W2: WideString;
  L: integer;
begin
  W := 'foo bar baz';

  Str := TFileStream.Create('test.bin', fmCreate);
  try
    // write WideString
    L := Length(W);
    Str.WriteBuffer(L, SizeOf(integer));
    if L > 0 then
      Str.WriteBuffer(W[1], L * SizeOf(WideChar));

    Str.Seek(0, soFromBeginning);
    // read back WideString
    Str.ReadBuffer(L, SizeOf(integer));
    if L > 0 then begin
      SetLength(W2, L);
      Str.ReadBuffer(W2[1], L * SizeOf(WideChar));
    end else
      W2 := '';
    Assert(W = W2);
  finally
    Str.Free;
  end;
end;
mghie
+2  A: 

WideStrings contain a 'string' of WideChar's, which use 2 bytes each. If you want to store the UTF-16 (which WideStrings use internally) strings in a file, and be able to use this file in other programs like notepad, you need to write a byte order mark first: #$FEFF.

If you know this, writing can look like this:

Stream1.Write(WideString1[1],Length(WideString)*2); //2=SizeOf(WideChar)

reading can look like this:

Stream1.Read(WideChar1,2);//assert returned 2 and WideChar1=#$FEFF
SetLength(WideString1,(Stream1.Size div 2)-1);
Stream1.Read(WideString1[1],(Stream1.Size div 2)-1);
Stijn Sanders
He said he wants to store lots of strings, they're going to be intermixed with binary data, and they'll be prefixed by their lengths. Definitely not something to be used with Notepad. Your code dedicates the entire stream to a single string.
Rob Kennedy
Code unconditionally accessing the first element of an empty string will cause access violations.
mghie
+1  A: 

You can also use TFastFileStream for reading the data or strings, I pasted the unit at http://pastebin.com/m6ecdc8c2 and a sample below:

program Project36;

{$APPTYPE CONSOLE}

uses
  SysUtils, Classes,
  FastStream in 'FastStream.pas';

const
  WideNull: WideChar = #0;

procedure WriteWideStringToStream(Stream: TFileStream; var Data: WideString);
var
  len: Word;
begin
  len := Length(Data);
  // Write WideString length
  Stream.Write(len, SizeOf(len));
  if (len > 0) then
  begin
    // Write WideString
    Stream.Write(Data[1], len * SizeOf(WideChar));
  end;
  // Write null termination
  Stream.Write(WideNull, SizeOf(WideNull));
end;

procedure CreateTestFile;
var
  Stream: TFileStream;
  MyString: WideString;
begin
  Stream := TFileStream.Create('test.bin', fmCreate);
  try
    MyString := 'Hello World!';
    WriteWideStringToStream(Stream, MyString);

    MyString := 'Speed is Delphi!';
    WriteWideStringToStream(Stream, MyString);
  finally
    Stream.Free;
  end;
end;

function ReadWideStringFromStream(Stream: TFastFileStream): WideString;
var
  len: Word;
begin
  // Read length of WideString
  Stream.Read(len, SizeOf(len));
  // Read WideString
  Result := PWideChar(Cardinal(Stream.Memory) + Stream.Position);
  // Update position and skip null termination
  Stream.Position := Stream.Position + (len * SizeOf(WideChar)) + SizeOf(WideNull);
end;

procedure ReadTestFile;
var
  Stream: TFastFileStream;

  my_wide_string: WideString;
begin
  Stream := TFastFileStream.Create('test.bin');
  try
    Stream.Position := 0;
    // Read WideString
    my_wide_string := ReadWideStringFromStream(Stream);
    WriteLn(my_wide_string);
    // Read another WideString
    my_wide_string := ReadWideStringFromStream(Stream);
    WriteLn(my_wide_string);
  finally
    Stream.Free;
  end;
end;

begin
  CreateTestFile;
  ReadTestFile;
  ReadLn;
end.
pani
Note: That code won't work if the string to be read contains any null characters.
Rob Kennedy
Code unconditionally accessing the first element of an empty string will cause access violations.
mghie
Thanks mghie, code is fixed.
pani