views:

205

answers:

2

I maintain a Delphi program which uses typed binary files as its native file format. After upgrading from Turbo Delphi to Delphi 2010, all chars in the record type being stored started being stored with 2 bytes rather than one.

The data types being stored are char and array[1..5] of char.

So before, part of the file looked like:

4C 20 20 20 4E 4E 4E 4E

Now it looks like:

4C 00 20 00 20 00 20 00 4E 00 4E 00 4E 00 4E 00

First of all, why did this happen in the first place?

Secondly, how can I still read my files, keeping in mind that there are now old files and new files floating around in the universe?

I will monitor this question obsessively after lunch. Feel free to ask for more information in comments.

+10  A: 

This happened when the default string type was changed from AnsiString to UnicodeString in Delphi 2009. Sounds like you were writing strings to the file. Redeclare them in the record as AnsiString and it should work fine.

Same goes for char. The original char was an AnsiChar, one byte per character. Now the default char is a WideChar, which is a UTF-16 char, 2 bytes per character. Redeclare your char arrays as arrays of AnsiChar and you'll get your old file style back.

As for being aware that both styles exist, that's a mess. Unless there's something like a version number in the file that's been changed when you upgraded your Delphi version, I suppose the only thing you can do is scan for 00 bytes in the character data and then have it read in either a AnsiChar or a WideChar version of the record based on whether it finds it.

Mason Wheeler
See clarified question.
Daniel Straight
@Daniel: OK. Edited my answer.
Mason Wheeler
Can I redeclare all my chars as ansichar? These records are used throughout the program and lots of other char variables are involved. Can I cast between the two and just switch the chars in the record to ansichars?
Daniel Straight
It looks like just redeclaring everything to be ansichar works fine. Thank you.
Daniel Straight
@Daniel: That's a very big question. The unicode upgrade was a pretty major thing, and the unicode string is now the default string type. It's best to use unicode strings and unicode chars as much as you can internally. You can convert between unicode and ansi easily enough with casts. See Cary Jensen's writeup at http://www.embarcadero.com/images/dm/technical-papers/delphi-unicode-migration.pdf for some helpful tips.
Mason Wheeler
Is there no compiler switch to get the old char size?
dthorpe
@dthorpe: Nope. Since so much of the VCL works with text, having two different default string types would basically require maintaining two different copies of the VCL and keeping them in sync.
Mason Wheeler
@Mason: Not sure that I agree. Yes, a lot of VCL works with text, but I'm pretty sure the majority of that can be implemented in a charsize agnostic way, just by using the normal Delphi string and char types in the appropriate way. That's what the string and char types are for, after all.
dthorpe
Coming from dthorpe, I had to upvote that comment.
Warren P
I'm just commenting from the peanut gallery. We can see the technical aspects, but what we cannot see are what business aspects were in play that influenced or overrode the technical decisions.
dthorpe
A: 

In your code, change the string type declaration to AnsiString, and char type declaration to AnsiChar. It will use the same encoding than with previous version of Delphi. And AnsiString/AnsiChar types work also with previous versions of Delphi. But there is no global compiler switch. Then convert this AnsiString/AnsiChar to unicode string.

Here are two examples, doing the same thing, one using an array of AnsiChar, one with direct reading of an AnsiString content. Both return a generic Unicode string:

function Read5(S: Stream): string;
var chars: array[1..5] of AnsiChar;
    tmp: AnsiString;
    i: integer;
begin
  S.Read(chars,5);
  for i := 1 to 5 do
    tmp := tmp+chars[i];
  result := string(tmp);
end;


function Read5(S: Stream): string;
var tmp: AnsiString;
begin
  SetLength(tmp,5);
  S.Read(tmp[1],5);
  result := string(tmp);
end;

You can use AnsiChars in all your program, without any problem.

But you may have some problems if your AnsiChars are used in string functions (like pos or copy).

Always take a close look to Delphi 2010 compiler warnings, and try to avoid any implicit ansi-unicode conversion by making them explicit.

A.Bouchez