views:

234

answers:

3

Here is another question about convert old code to D2009 and Unicode. I'm certain that there is simple but i don't see the solution... CharacterSet is a set of Char and s[i] should also be a Char. But the compiler still think there is a conflict between AnsiChar and Char.

The code:

TSetOfChar = Set of Char;

procedure aFunc;
var
  CharacterSet: TSetOfChar;
  s: String;
  j: Integer;
  CaseSensitive: Boolean;
begin
  // Other code that assign a string to s
  // Set CaseSensitive to a value

  CharacterSet := [];
  for j := 1 to Length(s) do
  begin
    Include(CharacterSet, s[j]);  // E2010 Incompatible types: 'AnsiChar' and 'Char'
    if not CaseSensitive then
    begin
      Include(CharacterSet, AnsiUpperCase(s[j])[1]);
      Include(CharacterSet, AnsiLowerCase(s[j])[1])
    end
  end;
end;  
+4  A: 

Because a Pascal set can't have a range higher than 0..255, the compiler quietly converts sets of chars to sets of AnsiChars. That's what's causing trouble for you.

Mason Wheeler
wow, that's some strange behaviour IMHO. A would definitely prefer the compiler giving an error here. Why "quietly" here when all the other conversions lead to compiler warnings?
Smasher
+3  A: 

There is no good and simple answer to the question (the reason is already given by Mason). The good solution is to reconsider the algoritm to get rid off "set of char" type. The quick and dirty solution is to preserve ansi chars and strings:

TSetOfChar = Set of AnsiChar;

procedure aFunc;
var
  CharacterSet: TSetOfChar;
  s: String;
  S1, SU, SL: Ansistring;
  j: Integer;
  CaseSensitive: Boolean;
begin
  // Other code that assign a string to s
  // Set CaseSensitive to a value

  S1:= s;
  SU:= AnsiUpperCase(s);
  SL:= AnsiLowerCase(s);
  CharacterSet := [];
  for j := 1 to Length(S1) do
  begin
    Include(CharacterSet, S1[j]);
    if not CaseSensitive then
    begin
      Include(CharacterSet, SU[j]);
      Include(CharacterSet, SL[j]);
    end
  end;
end;  
Serg
+3  A: 

Delphi does not support sets of Unicode characters. You can only use AnsiChar in a set, but that's not big enough to fit all the possible characters your string might hold.

Instead of Delphi's native set type, though, you can use the TBits type.

procedure aFunc;
var
  CharacterSet: TBits;
  s: String;
  c: Char;
  CaseSensitive: Boolean;
begin
  // Other code that assign a string to s
  // Set CaseSensitive to a value

  CharacterSet := TBits.Create;
  try
    for c in s do begin
      CharacterSet[Ord(c)] := True;
      if not CaseSensitive then begin
        CharacterSet[Ord(Character.ToUpper(c))] := True;
        CharacterSet[Ord(Character.ToLower(c))] := True;
      end
    end;
  finally
    CharacterSet.Free;
  end;
end;

A TBits object automatically expends to accommodate the highest bit it needs to represent.

Other changes I made to your code include using the new "for-in" loop style, and the new Character unit for dealing with Unicode characters.

Rob Kennedy
I am not sure the question really implies the contruction of "set-of-all-widechar's" - like type. More probably the author is actually working with a little subset of all unicode characters.
Serg
The input type is a UnicodeString. We haven't been told that it only holds characters from the current code page. No matter how little the subset might be, it won't matter if it's not also a subset of AnsiChar.
Rob Kennedy
Very handy idea. If you really need a 65K bit-array, this works nicely, and auto-expanding to fill the number of code points you actually use means you don't have to preallocate out to 16 bits.
Warren P
Something else to keep in mind is that UnicodeString is based on UTF-16, which uses 2-character surrogates for a subset of Unicode codepoints. To be politically correct, you should decode UTF-16 to UTF-32 before than processing the set of "characetrs". The TCharacter class has methods for dealing with UTF-32 codepoints and UTF-16 surrogates.
Remy Lebeau - TeamB