views:

2240

answers:

3

I found a Windows API function that performs "natural comparison" of strings. It is defined as follows:

int StrCmpLogicalW(      
    LPCWSTR psz1,
    LPCWSTR psz2
);

To use it in delphi, I declared it this way:

interface
  function StrCmpLogicalW(psz1, psz2: PWideChar): integer; stdcall;

implementation
  function StrCmpLogicalW; external 'shlwapi.dll' name 'StrCmpLogicalW';

Because it compares unicode strings, I'm not sure how to call it when I want to compare ansi strings. It seems to be enough to cast strings to WideString and then to PWideChar, however, I have no idea whether this approach is correct:

function AnsiNaturalCompareText(const S1, S2: string): integer;
begin
  Result := StrCmpLogicalW(PWideChar(WideString(S1)), PWideChar(WideString(S2)));
end;

I know very little about character encoding so this is the reason of my question. Is this function OK or should I first convert both the compared strings somehow?

Thanks for your answers in advance.

Mariusz.

A: 

Yes, that's the right way to get a PWideChar from an AnsiString. Kind of ugly, but correct.

You might want to try wrapping it in a helper function, to improve readability. For example:

function WideCString(const value: AnsiString): PWideChar;
begin
  result := PWideChar(WideString(value));
end;
Mason Wheeler
That function is a bad idea. It creates a temporary WideString from the AnsiString, but the WideString has a lifetime managed by the compiler. The temporary goes out of scope when the function returns, at which point the WideString is destroyed and the PWideChar becomes invalid.
Rob Kennedy
Try it for yourself.program Project1;{$APPTYPE CONSOLE}uses SysUtils;function WideCString(const value: AnsiString): PWideChar;begin result := PWideChar(WideString(value));end;var inputStr: ansiString; tempChar, tempChar2: PWideChar;begin inputStr := 'The string remains valid.'; writeln(wideCString(inputStr)); readln;end.
Mason Wheeler
Although the pointer may remain useable, that doesn't mean it's valid. It just happens to still point at valid memory.
David M
Er, "valid memory" there meaning memory containing useful data not yet overwritten with something else. AnsiString::c_str()'s help file says, "This pointer is valid until the System::AnsiString::c_string is next modified." I suspect WideString and this cast are similar.
David M
@Mason: Sample code can be used to prove the existence of a bug, but never the absence. Try it for yourself, use your function for the two parameters of StrCmpLogicalW(), the result will always be 0 as the second call overwrites the memory used for the first parameter.
mghie
Rob is totally correct - don't do that!
gabr
+4  A: 

There might be an ANSI variant for your function to (I haven't checked). Most Wide API's are available as an ANSI version too, just change the W suffix to an A, and you're set. Windows does the back-and-forth conversion transparantly for you in that case.

PS: Here's an article describing the lack of StrCmpLogicalA : http://blogs.msdn.com/joshpoley/archive/2008/04/28/strcmplogicala.aspx

PatrickvL
+5  A: 

Keep in mind that casting a string to a WideString will convert it using default system codepage which may or may not be what you need. Typically, you'd want to use current user's locale.

From WCharToChar in System.pas:

Result := MultiByteToWideChar(DefaultSystemCodePage, 0, CharSource, SrcBytes,
  WCharDest, DestChars);

You can change DefaultSystemCodePage by calling SetMultiByteConversionCodePage.

gabr
To be honest, I wanted to create a function that would be a counterpart of the AnsiCompareText() (of course this one performs a lexical comparison of ANSI strings). Does it use current user's locale or default system codepage? Thanks.
Mariusz
Use the source: Result := CompareString(LOCALE_USER_DEFAULT, NORM_IGNORECASE, PChar(S1), Length(S1), PChar(S2), Length(S2)) - CSTR_EQUAL;And from MSDN:LOCALE_USER_DEFAULTThe current user's default locale.So no, it doesn't use current locale but default locale of the current user. Which again may be or may be not what you really want :-/
gabr