I need to write a program which will browse through strings of various lengths and select only those which are written using symbols from set defined by me (particularly Japanese letters). Strings will contain words written in different languages (German, French, Arabic, Russian, English etc). Obviously there is huge number of possible characters. I do not know which structure to use for that? I am using Delphi 7 right now. Can anybody suggest how to write such program?
Obviously you would be better off with Delphi 2010, since the VCL in delphi 7 is not aware of Unicode strings. You can use WideString types, and WideChar types in Delphi 7, and you can install a component set like the TNT Unicode Components to help you create a user interface that can display your results.
For a very-large-set type, consider using a bit array like TBits. A bit array of length 65536 would hold enough to contain every UTF-16 code-point. Checking if Char X is in Set Y, would be basically:
function WideCharsInSet( wcstr:WideString; wcset:TBits):Boolean;
var
n:Integer;
wc:WideChar;
begin
result := false;
for n := 1 to Length(wcstr) do begin
wc := wcstr[n];
if wcset[Ord(wc)] then
result := true;
end;
end;
procedure Demo;
var
wcset1:TBits;
s:WideString;
begin
wcset1 := TBits.Create;
try
// 1157 - Hangul Korean codepoint I found with Char Map
wcset1[1157] := true;
// go get a string value s:
s := WideChar(1157);
// return true if at least one element in set wcset is found in string s:
if WideCharsInSet(s,wcset1) then begin
Application.MessageBox('Found it','found it',MB_OK);
end;
finally
wcset1.Free;
end;
end;
I also recommend to switch to Delphi 2010 (why bother with 2009 anymore?)!
If in the unlikely case that you are stuck with Delphi 7 the Unicode Library from Mike Lischke may be somewhat helpful.
For the simple processing of strings in the manner you describe, do not be put off by suggestions that you should upgrade to the latest compiler and Unicode enabled framework. The Unicode support itself is of course provided by the underlying Windows API which is of course (directly) accessible from "non-Unicode" versions of Delphi just as much as from "Unicode versions".
I suspect that most if not all of the Unicode support that you need for the purposes outlined in your question can be obtained from the Unicode support provided in the JEDI JCL.
For any visual component support you may require the TNT control set has the appeal of being free.