views:

548

answers:

3

I need to write a program which will browse through strings of various lengths and select only those which are written using symbols from set defined by me (particularly Japanese letters). Strings will contain words written in different languages (German, French, Arabic, Russian, English etc). Obviously there is huge number of possible characters. I do not know which structure to use for that? I am using Delphi 7 right now. Can anybody suggest how to write such program?

+10  A: 

Obviously you would be better off with Delphi 2010, since the VCL in delphi 7 is not aware of Unicode strings. You can use WideString types, and WideChar types in Delphi 7, and you can install a component set like the TNT Unicode Components to help you create a user interface that can display your results.

For a very-large-set type, consider using a bit array like TBits. A bit array of length 65536 would hold enough to contain every UTF-16 code-point. Checking if Char X is in Set Y, would be basically:

function WideCharsInSet( wcstr:WideString; wcset:TBits):Boolean;
var
 n:Integer;
 wc:WideChar;
begin
result := false;
for n := 1 to Length(wcstr) do begin
  wc := wcstr[n];
  if wcset[Ord(wc)] then
      result := true;
end;
end;

procedure Demo;
var
 wcset1:TBits;
 s:WideString;
begin
 wcset1 := TBits.Create;
 try
  // 1157 - Hangul Korean codepoint I found with Char Map
    wcset1[1157] := true;         
    // go get a string value s:
    s := WideChar(1157);
// return true if at least one element in set wcset is found in string s:
    if WideCharsInSet(s,wcset1) then begin
        Application.MessageBox('Found it','found it',MB_OK);
    end;

 finally
  wcset1.Free;
 end;

end;
Warren P
+1 all the good bits in the answer. bigsets, TNT and recommend not doing this in D7 at all.
Marco van de Voort
I wrote a more useful bit of code here for you, Tofig
Warren P
One great feature in Delphi 2010 TStringList class is the ability to load a file from disk, automatically determine UTF8 or UTF16 encoding from the byte-markers, and so on. That is another part of your task, Tofig, that will be made more tricky on version of Delphi older than 2009/2010.
Warren P
Delphi 2010 really makes very little difference here. The poster is looking to process strings at a very simple level. Invoking the help of an entire Unicode enabled framework simply to gain access to a handful of functions and a couple of classes that encapsulate the needed Unicode Windows API functions is overkill. I suspect that all the poster really needs is the Unicode support unit(s) provided by the JEDI JCL.
Deltics
Deltics, the Delphi 2010 Troll.
Warren P
@Warren: TWideStringList in the JEDI JCL also loads Unicode from disk with proper respect for encoding. JclUnicode is free and works with pretty much all versions of Delphi. Unicode TStringList requires a purchase of new software. Poster: I need to get to the shops and my car is out of petrol. My answer: Get some more petrol. Your answer is presumably: "Buy a new car". :) If being "practical" is trolling then colour me green and call me Shrek (yeah, I know: he was an ogre, not a troll). Note: This accepted answer does not need 2010. The opposite of troll might be "Fanboy"? ;)
Deltics
As I was the person who explained to this guy that he doesn't NEED Delphi 2010 for his stated problem, I find "troll" fits here exactly. I instead said, "better off". Many many people agree with me on that. A native VCL that uses the String=UnicodeString type naming convention is 100% unicode. Everything else was the "best available kludge" until Delphi 2009 shipped. And everybody who isn't stuck in the past or scared of porting their ancient crufty code-base already knows it from experience.
Warren P
+3  A: 

I also recommend to switch to Delphi 2010 (why bother with 2009 anymore?)!

If in the unlikely case that you are stuck with Delphi 7 the Unicode Library from Mike Lischke may be somewhat helpful.

Uwe Raabe
+4  A: 

For the simple processing of strings in the manner you describe, do not be put off by suggestions that you should upgrade to the latest compiler and Unicode enabled framework. The Unicode support itself is of course provided by the underlying Windows API which is of course (directly) accessible from "non-Unicode" versions of Delphi just as much as from "Unicode versions".

I suspect that most if not all of the Unicode support that you need for the purposes outlined in your question can be obtained from the Unicode support provided in the JEDI JCL.

For any visual component support you may require the TNT control set has the appeal of being free.

Deltics
+1, excellent argument. The code in the accepted answer compiles and works flawlessly in Delphi 4 even.
mghie
Now even Delphi is split into the Traddies and the up-to-date people.
Warren P
I prefer to think of it as "getting the job done with the minimum of fuss, bother and expense" people and "change for changes sake without thinking about what is actually needed" people. :)
Deltics
IMHO it is "I can do everything I want with the things I have" or "I can do that and more easier and help keeping my favorite development environment alive".
Uwe Raabe
And you saw the part where the poster stipulated they were using Delphi 7, right?
Deltics