views:

65

answers:

2

I'm slightly confused and hoping for enlightenment.

I'm using Delphi 2010 for this project and I'm trying to compare 2 strings.

Using the code below fails

if AnsiStrIComp(PAnsiChar(sCatName), PAnsiChar(CatNode.CatName)) = 0 then...

because according to the debugger only the first character of each string is being compared. I.E. if sCatName is "Automobiles", PAnsiChar(sCatName) is "A"

I want to be able to compare strings that may be in different languages, for example English vs Japanese.

In this case I am looking for a match, but I have other functions used for sorting, etc. where I need to know how the strings compare (less than, equal, greater than).

+3  A: 

I assume that sCatName and CatNode.CatName are defined as strings (= UnicodeStrings)?. They should be.

There is no need to convert the strings to null-terminated strings! This you (mostly) only need to do when working with the Windows API.

If you want to test equality of two strings, use SameStr(S1, S2) (case sensitive matching) or SameText(S1, S2) (case insensitive matching), or simply S1 = S2 in the first case. All three options return true or false, depending on the strings equality.

If you want to get a numerical value based on the ordinal values of the characters (as in sorting), then use CompareStr(S1, S2) or CompareText(S1, S2). These return a negative integer, zero, or a positive integer.

(You might want to use the Ansi- functions: AnsiSameStr, AnsiSameText, AnsiCompareStr, and AnsiCompareText; these functions will use the current locale. The non Ansi- functions will accept a third, optional parameter, explicitly specifying the locale to use.)

Update

Please read Remy Lebeau's comments regarding the cause of the problem.

Andreas Rejbrand
Thanks for your reply. For the function in question AnsiCompareText() looks like it will do the job as well as being a simpler implementation.But I would like to understand why PAnsiChar(sCatName) is taking a string variable and returning only the first character.
TheSteven
If `sCatName` is a string, then `PAnsiChar(sCatName)` creates a null-terminated string representation of `sCatName`, and returns the pointer, the address, of the first character of this null-terminated string. (In Delphi, `string`s are *not* null-terminated, so when communicating with the Windows API, for instance, you might need to create null-terminated strings from Delphi strings).
Andreas Rejbrand
Thanks for you quick and detailed responses :)
TheSteven
The UnicodeString type uses UTF-16 encoding, which uses 2 bytes per codeunit, and 1-2 codeunits per Unicode character (depending on whether surrogates are used). For Unicode characters in the ASCII range, the second byte of each codeunit will be set to 0. Type-casting a UnicodeString to a PAnsiChar will treat the pointer as a null-terminated Ansi character string, not a null-terminated Unicode character string. So the pointer will always end at the second byte of the string.
Remy Lebeau - TeamB
@Andreas: Delphi strings ARE null-terminated. The null terminator is not considered part of the string data (it is not counted by Length(string)), but the terminator is present nontheless. This is so a PChar typecast can simply return the address of the string data as-is without having to allocate new memory instead.
Remy Lebeau - TeamB
@Remy Lebeau: I wasn't aware of that null byte. But indeed it makes sense - it is a very small cost for a potentially huge improvement of performance!
Andreas Rejbrand
@Remy: You really should have downvoted me, at least until I removed the incorrect statement about the non-existing null byte!
Andreas Rejbrand
Here we can observe the null character: http://docwiki.embarcadero.com/RADStudio/en/Internal_Data_Formats#Long_String_Types
Andreas Rejbrand
Keep in mind that relying on nullchar termination will go wrong for certain strings (e.g. compare aaaa#0bbb with aaaa#0ccc), since it is perfectly legal for Delphi strings to contain additional null chars
Marco van de Voort
A: 

What about simple sCatName=CatNode.CatName? If they are strings it should work.

mbq
I am looking for non-case sensitive comparison. In this particular case I am checking for equality. While UpperCase(sCatName)=UpperCase(CatNode.Name) would work (for this particular example) it is my understanding the the built-in string comparison functions are faster.
TheSteven
@TheSteven: Yes, `SameText(A, B)` is faster than `AnsiUpperCase(A) = AnsiUpperCase(B)`.
Andreas Rejbrand
So is `AnsiSameText(A, B)`
Remy Lebeau - TeamB
@TheSteven I was cut off from the internet, and so I missed the whole discussion; now I must just agree with Andreas and Remy.
mbq