tags:

views:

218

answers:

4

Consider the following scenario:

type 
PStructureForSomeCDLL = ^TStructureForSomeCDLL;
TStructureForSomeCDLL = record 
  pName: PAnsiChar;
end

function FillStructureForDLL: PStructureForSomeDLL;
begin
  New(Result);
  // Result.pName := PAnsiChar(SomeObject.SomeString);  // Old D7 code working all right
  Result.pName := Utf8ToAnsi(UTF8Encode(SomeObject.SomeString));  // New problematic unicode version
end;

...code to pass FillStructureForDLL to DLL...

The problem in unicode version is that the string conversion involved now returns a new string on stack and that's reclaimed at the end of the FillStructureForDLL call, leaving the DLL with corrupted data. In old D7 code, there were no intermediate conversion funcs and thus no problem.

My current solution is a converter function like below, which is IMO too much of an hack. Is there a more elegant way of achieving the same result?

var gKeepStrings: array of AnsiString;

{ Convert the given Unicode value S to ANSI and increase the ref. count 
  of it so that returned pointer stays valid }
function ConvertToPAnsiChar(const S: string): PAnsiChar;
var temp: AnsiString;
begin
  SetLength(gKeepStrings, Length(gKeepStrings) + 1);
  temp := Utf8ToAnsi(UTF8Encode(S));
  gKeepStrings[High(gKeepStrings)] := temp; // keeps the resulting pointer valid 
                                            // by incresing the ref. count of temp.
  Result := PAnsiChar(temp);
end;
+2  A: 

There are at least three ways to do this.

  1. You could change SomeObject's class definition to use an AnsiString instead of a string.
  2. You could use a conversion system to hold references, like in your example.
  3. You could initialize result.pname with GetMem and copy the result of the conversion to result.pname^ with Move. Just remember to FreeMem it when you're done.

Unfortunately, none of them is a perfect solution. So take a look at the options and decide which one works best for you.

Mason Wheeler
You can't initialize `pName` with `New`; you'll get a pointer to a single character. Use `GetMem` or `StrNew` instead.
Rob Kennedy
...oops. Thanks for catching that. Fixed.
Mason Wheeler
Ditto. No perfect solution except modifying the DLL structures. Thanks for the suggestions.
utku_karatas
+3  A: 

One way might be to tackle the problem before it becomes a problem, by which I mean adapt the class of SomeObject to maintain an ANSI Encoded version of SomeString (ANSISomeString?) for you alongside the original SomeString, keeping the two in step in a "setter" for the SomeString property (using the same UTF8 > ANSI conversion you are already doing).

In non-Unicode versions of the compiler make ANSISomeString be simply a "copy" of SomeString string, which will of course not be a copy, merely an additional ref count on SomeString. In the Unicode version it references a separate ANSI encoding with the same "lifetime" as the original SomeString.

procedure TSomeObjectClass.SetSomeString(const aValue: String);
begin
  fSomeString := aValue;

{$ifdef UNICODE}
  fANSISomeString := Utf8ToAnsi(UTF8Encode(aValue));
{$else}
  fANSISomeString := fSomeString;
{$endif}
end;

In your FillStructure... function, simply change your code to refer to the ANSISomeString property - this then is entirely independent of whether compiling for Unicode or not.

function FillStructureForDLL: PStructureForSomeDLL;
begin
  New(Result);
  result.pName := PANSIChar(SomeObject.ANSISomeString);
end;
Deltics
Good POV. Thanks!
utku_karatas
+2  A: 

Hopefully you already have code in your application to properly dispose off of all the dynamically allocated records that you New() in FillStructureForDLL(). I consider this code highly dubious, but let's assume this is reduced code to demonstrate the problem only. Anyway, the DLL you pass the record instance to does not care how big the chunk of memory is, it will only get a pointer to it anyway. So you are free to increase the size of the record to make place for the Pascal string that is now a temporary instance on the stack in the Unicode version:

type 
  PStructureForSomeCDLL = ^TStructureForSomeCDLL;
  TStructureForSomeCDLL = record 
    pName: PAnsiChar;
    // ... other parts of the record
    pNameBuffer: string;
  end;

And the function:

function FillStructureForDLL: PStructureForSomeDLL;
begin
  New(Result);
  // there may be a bug here, can't test on the Mac... idea should be clear
  Result.pNameBuffer := Utf8ToAnsi(UTF8Encode(SomeObject.SomeString));
  Result.pName := Result.pNameBuffer;
end;

BTW: You wouldn't even have that problem if the record passed to the DLL was a stack variable in the procedure or function that calls the DLL function. In that case the temporary string buffers will only be necessary in the Unicode version if more than one PAnsiChar has to be passed (the conversion calls would otherwise reuse the temporary string). Consider changing the code accordingly.

Edit:

You write in a comment:

This would be best solution if modifying the DLL structures were an option.

Are you sure you can't use this solution? The point is that from the POV of the DLL the structure isn't modified at all. Maybe I didn't make myself clear, but the DLL will not care whether a structure passed to it is exactly what it is declared to be. It will be passed a pointer to the structure, and this pointer needs to point to a block of memory that is at least as large as the structure, and needs to have the same memory layout. However, it can be a block of memory that is larger than the original structure, and contain additional data.

This is actually used in quite a lot of places in the Windows API. Did you ever wonder why there are structures in the Windows API that contain as the first thing an ordinal value giving the size of the structure? It's the key to API evolution while preserving backwards compatibility. Whenever new information is needed for the API function to work it is simply appended to the existing structure, and a new version of the structure is declared. Note that the memory layout of older versions of the structure is preserved. Old clients of the DLL can still call the new function, which will use the size member of the structure to determine which API version is called.

In your case no different versions of the structure exist as far as the DLL is concerned. However, you are free to declare it larger for your application than it really is, provided the memory layout of the real structure is preserved, and additional data is only appended. The only case where this wouldn't work is when the last part of the structure were a record with varying size, kind of like the Windows BITMAP structure - a fixed header and dynamic data. However, your record looks like it has a fixed length.

mghie
This would be best solution if modifying the DLL structures were an option. Thank you.
utku_karatas
A: 

Wouldn't PChar(AnsiString(SomeObject.SomeString)) work?

Remko
No. This: "The problem in unicode version is that the string conversion involved now returns a new string on stack and that's reclaimed at the end of the FillStructureForDLL call, leaving the DLL with corrupted data." would apply just as well.
mghie