views:

292

answers:

8

I have a small list of keywords. What I'd really like to do is akin to:

case MyKeyword of
  'CHIL': (code for CHIL);
  'HUSB': (code for HUSB);
  'WIFE': (code for WIFE);
  'SEX': (code for SEX);
  else (code for everything else);
end;

Unfortunately the CASE statement can't be used like that for strings.

I could use the straight IF THEN ELSE IF construct, e.g.:

if MyKeyword = 'CHIL' then (code for CHIL)
else if MyKeyword = 'HUSB' then (code for HUSB)
else if MyKeyword = 'WIFE' then (code for WIFE)
else if MyKeyword = 'SEX' then (code for SEX)
else (code for everything else);

but I've heard this is relatively inefficient.

What I had been doing instead is:

P := pos(' ' + MyKeyword + ' ', ' CHIL HUSB WIFE SEX ');
case P of
  1: (code for CHIL);
  6: (code for HUSB);
  11: (code for WIFE);
  17: (code for SEX);
  else (code for everything else);
end;

This, of course is not the best programming style, but it works fine for me and up to now didn't make a difference.

So what is the best way to rewrite this in Delphi so that it is both simple, understandable but also fast?

(For reference, I am using Delphi 2009 with Unicode strings.)


Followup:

Toby recommended I simply use the If Then Else construct. Looking back at my examples that used a CASE statement, I can see how that is a viable answer. Unfortunately, my inclusion of the CASE inadvertently hid my real question.

I actually don't care which keyword it is. That is just a bonus if the particular method can identify it like the POS method can. What I need is to know whether or not the keyword is in the set of keywords.

So really I want to know if there is anything better than:

if pos(' ' + MyKeyword + ' ', ' CHIL HUSB WIFE SEX ') > 0 then

The If Then Else equivalent does not seem better in this case being:

if (MyKeyword = 'CHIL') or (MyKeyword = 'HUSB') or (MyKeyword = 'WIFE') 
      or (MyKeyword = 'SEX') then

In Barry's comment to Kornel's question, he mentions the TDictionary Generic. I've not yet picked up on the new Generic collections and it looks like I should delve into them. My question here would be whether they are built for efficiency and how would using TDictionary compare in looks and in speed to the above two lines?


In later profiling, I have found that the concatenation of strings as in: (' ' + MyKeyword + ' ') is VERY expensive time-wise and should be avoided whenever possible. Almost any other solution is better than doing this.

+6  A: 
type TKeyword = ( ECHIL, EHUSB, EWIFE, ESEX )
const TKeyNames : array[TKeyword] of string = ( 'CHIL', 'HUSB', 'WIFE', 'SEX' );

Key : TKeyword

case Key of 
  ECHIL : (code for CHIL);
  EHUSB : (code for HUSB);
  EWIFE : (code for WIFE);
  ESEX  : (code for SEX);
  else (code for everything else);
end;

In general, don't use strings as "keys", use enumerations -- they're safer, and you get much of a speed increase.

Unfortunately Delphi (as far as I know) has no standard hashtable implementation that'd be easy to use, you can always roll up your own however.

BTW, code for SEX sounds much more fun than "will code for beer" :P

Kornel Kisielewicz
Generics.Collections contains `TDictionary<TKey,TValue>`.
Barry Kelly
@Kornel: I'm not using strings as keys. I'm parsing input text and I'm actually getting the keyword as a string. ... and yes, it's always nice to sneak a little SEX in here and there.
lkessler
+2  A: 

I think the

if x then

else

is by far the best solution. Your own solution is very inelegant and for the miniscule improvement in efficiency its not worth it. The if then else construct is perfect for what you want.

Toby Allen
@Toby: I agree with you when I consider the way I gave my example originally. But please see my followup.
lkessler
+2  A: 

For cleanest code, it's best to use case with enumerations, or if..else with strings, as others have suggested. There are a few solutions off the beaten track, though, if you really want to go there.

One is to use a string hash map, which is like a list "indexed" by strings. Values in the list would be procedure pointers to the code you wish to execute for each string. All the procedures would have to have the same exact parameters - and you'd have to write the hash map yourself, or find one you can use e.g. in JVCL.

// prepare the hash map
myHashMap[ 'CHIL' ] := @procToCallForChil;
myHashMap[ 'HUSB' ] := @procToCallForHusb;

// use the hash map
type
  TPrototypeProc = procedure( your-parameters-here );
var
  procToCall : TPrototypeProc;
begin
  @procToCall := myHashMap[ 'CHIL' ];
  procToCall( your-parameters-here );
end;

Two, and this is something weird I've tried out once: if and only if your identifier strings fit in 4 characters (as in all your examples), and they are ansi strings (not unicode strings), you can map strings to integers directly. A 32-bit integer is equivalent in size to a 4-byte string, so you can do:

function FourCharStringToInteger( const s : ansistring ) : integer;
begin
  assert(( length( s )  = 4 ), 'String to convert must be exactly 4 characters long!');
  move( s[1], result, 4 );
end;

Any string identifier that's shorter than 4 characters would have to be padded with spaces, and the strings are case-sensitive ('chil' and 'CHIL' yield different values). To use this with a case statement, you'd have to precompute the values, which may or may not be suitable for your purpose:

id_CHIL := FourCharStringToInteger( 'CHIL' );

And finally you can have your case statement:

case id of
  id_CHIL : ...
  id_HUSB : ...
end;

This is "special care" code, and might only make a difference if you have a hundreds or more of your identifier strings - it really shouldn't be done at all :) It is certainly easy to break. I agree though that case statements are neater than interminable processions of ...else... blocks.

moodforaday
An interesting idea. I'd have to time it to see if it saves much time. But most of my keyword lists are quite short (less than 10 words).
lkessler
A: 
Paul-Jan
A: 

You could also switch to a more object-oriented approach and have something like

TCommand = class
public
  procedure Execute; virtual; abstract;
end;
TCommandClass = class of TCommand;

and have a factory create the commands for you

TCommandFactory = class
public
  procedure RegisterKeyWord (const Keyword : String; CmdClass : TCommandClass);
  function  CreateCommand (const Keyword : String) : TCommand;
end;

The calling code would then look as easy as:

CommandFactory.CreateCommand (Keyword).Execute;

That way you have localized and encapsulated all keyword strings in a simple factory class, which makes later changes to the input language very easy. Using this command-based approach has other advantages such as easy extensibility.

I know that this maybe interpreted as not an answer to your question, because it's not about how fast you can do it. But it's another approach that might be worth thinking about.

Smasher
+3  A: 

Your series of if statements to check whether the input is any of the given keywords could be shortened by checking single characters, to bail out as soon as possible. Your example

if MyKeyword = 'CHIL' then (code for CHIL)
else if MyKeyword = 'HUSB' then (code for HUSB)
else if MyKeyword = 'WIFE' then (code for WIFE)
else if MyKeyword = 'SEX' then (code for SEX)
else (code for everything else);

could be replaced by

KW := kwOther;
if MyKeyword <> '' then begin
  case MyKeyword[1] of
    'C': if MyKeyword = 'CHIL' then KW := kwCHIL;
    'H': if MyKeyword = 'HUSB' then KW := kwHUSB;
    'S': if MyKeyword = 'SEX' then KW := kwSEX;
    'W': if MyKeyword = 'WIFE' then KW := kwWIFE;
  end;
end;
case KW of
  kwCHIL: {code for CHIL};
  kwHUSB: {code for HUSB};
  kwSEX: {code for SEX};
  kwWIFE: {code for WIFE};
else
  {code for everything else};
end;

For case-insensitive keywords the case would check for upper and lower case, and the comparison would use AnsiCompareText(). If you have several keywords with the same first letter you could nest these case statements, but readability would probably suffer soon for very little speed gain.

Taking this to the maximum you could implement a state machine that uses a PChar to calculate the next state, which would branch to the else case as soon as the first non-matching character is found. It would be faster than any solution involving hashes.

mghie
+1 Exactly what we did implementing a parser/lexer
Lieven
+3  A: 

Mostly I use the IndexText function from StrUtils for that purpose. It is similar to your pos approach, but the return value is independent of the indiviual length of the strings. As a gimmick it is also case insensitive (use IndexStr if you don't want this).

function IndexText(const AText: string; const AValues: array of string): Integer; overload;

The comment to these functions actually mentions the case construct:

{ AnsiMatchText & AnsiIndexText provide case like function for dealing with strings }

Uwe Raabe
In addition, there is also the MatchText and MatchStr functions that can be used if only a true/false match result is needed. I see you gave a nice example at: http://coding.derkeiler.com/Archive/Delphi/borland.public.delphi.non-technical/2007-07/msg00209.html
lkessler
I like this function, but I did not find it in SysUtils. Did you mean StrUtils?
@jrodenhi: Duh, of course - StrUtils.pas. corrected!
Uwe Raabe
+5  A: 

You can use a const table (which must be alpha-sorted) and a quick binary sort. It's very efficient, and doesn't require any hashing.

Here is the function to use:

function IsKeyWord(const KeyWords: array of string; const aToken: String): Boolean;
// aToken must be already uppercase
var First, Last, I, Compare: Integer;
begin
  First := Low(Keywords);
  Last := High(Keywords);
  Result := False;
  while First <= Last do
  begin
    I := (First + Last) shr 1;
    Compare := CompareStr(Keywords[i],aToken);
    if Compare = 0 then
      begin
        Result := True;
        break;
      end
    else
    if Compare < 0  then
      First := I + 1 else
      Last := I - 1;
  end;
end;

And here, some examples of keywords:

const
  PASCALKEYWORDS: array[0..100] of string =
  ('ABSOLUTE', 'ABSTRACT', 'AND', 'ARRAY', 'AS', 'ASM', 'ASSEMBLER',
   'AUTOMATED', 'BEGIN', 'CASE', 'CDECL', 'CLASS', 'CONST', 'CONSTRUCTOR',
   'DEFAULT', 'DESTRUCTOR', 'DISPID', 'DISPINTERFACE', 'DIV', 'DO',
   'DOWNTO', 'DYNAMIC', 'ELSE', 'END', 'EXCEPT', 'EXPORT', 'EXPORTS',
   'EXTERNAL', 'FAR', 'FILE', 'FINALIZATION', 'FINALLY', 'FOR', 'FORWARD',
   'FUNCTION', 'GOTO', 'IF', 'IMPLEMENTATION', 'IN', 'INDEX', 'INHERITED',
   'INITIALIZATION', 'INLINE', 'INTERFACE', 'IS', 'LABEL', 'LIBRARY',
   'MESSAGE', 'MOD', 'NAME', 'NEAR', 'NIL', 'NODEFAULT', 'NOT', 'OBJECT',
   'OF', 'OR', 'OUT', 'OVERRIDE', 'PACKED', 'PASCAL', 'PRIVATE', 'PROCEDURE',
   'PROGRAM', 'PROPERTY', 'PROTECTED', 'PUBLIC', 'PUBLISHED', 'RAISE',
   'READ', 'READONLY', 'RECORD', 'REGISTER', 'REINTRODUCE', 'REPEAT', 'RESIDENT',
   'RESOURCESTRING', 'SAFECALL', 'SET', 'SHL', 'SHR', 'STDCALL', 'STORED',
   'STRING', 'STRINGRESOURCE', 'THEN', 'THREADVAR', 'TO', 'TRY', 'TYPE',
   'UNIT', 'UNTIL', 'USES', 'VAR', 'VARIANT', 'VIRTUAL', 'WHILE', 'WITH', 'WRITE',
   'WRITEONLY', 'XOR');

  DFMKEYWORDS: array[0..4] of string = (
    'END', 'FALSE', 'ITEM', 'OBJECT', 'TRUE');

  CKEYWORDS: array[0..47] of string = (
  'ASM', 'AUTO', 'BREAK', 'CASE', 'CATCH', 'CHAR', 'CLASS', 'CONST', 'CONTINUE',
  'DEFAULT', 'DELETE', 'DO', 'DOUBLE', 'ELSE', 'ENUM', 'EXTERN', 'FLOAT', 'FOR',
  'FRIEND', 'GOTO', 'IF', 'INLINE', 'INT', 'LONG', 'NEW', 'OPERATOR', 'PRIVATE',
  'PROTECTED', 'PUBLIC', 'REGISTER', 'RETURN', 'SHORT', 'SIGNED', 'SIZEOF',
  'STATIC', 'STRUCT', 'SWITCH', 'TEMPLATE', 'THIS', 'THROW', 'TRY', 'TYPEDEF',
  'UNION', 'UNSIGNED', 'VIRTUAL', 'VOID', 'VOLATILE', 'WHILE');

  CSHARPKEYWORDS : array[0..86] of string = (
  'ABSTRACT', 'AS', 'BASE', 'BOOL', 'BREAK', 'BY3', 'BYTE', 'CASE', 'CATCH', 'CHAR',
  'CHECKED', 'CLASS', 'CONST', 'CONTINUE', 'DECIMAL', 'DEFAULT', 'DELEGATE', 'DESCENDING',
  'DO', 'DOUBLE', 'ELSE', 'ENUM', 'EVENT', 'EXPLICIT', 'EXTERN', 'FALSE', 'FINALLY',
  'FIXED', 'FLOAT', 'FOR', 'FOREACH', 'FROM', 'GOTO', 'GROUP', 'IF', 'IMPLICIT',
  'IN', 'INT', 'INTERFACE', 'INTERNAL', 'INTO', 'IS', 'LOCK', 'LONG', 'NAMESPACE',
  'NEW', 'NULL', 'OBJECT', 'OPERATOR', 'ORDERBY', 'OUT', 'OVERRIDE', 'PARAMS',
  'PRIVATE', 'PROTECTED', 'PUBLIC', 'READONLY', 'REF', 'RETURN', 'SBYTE',
  'SEALED', 'SELECT', 'SHORT', 'SIZEOF', 'STACKALLOC', 'STATIC', 'STRING',
  'STRUCT', 'SWITCH', 'THIS', 'THROW', 'TRUE', 'TRY', 'TYPEOF', 'UINT', 'ULONG',
  'UNCHECKED', 'UNSAFE', 'USHORT', 'USING', 'VAR', 'VIRTUAL', 'VOID', 'VOLATILE',
  'WHERE', 'WHILE', 'YIELD');

And it's very easy to use:

  if IsKeyWord(PASCALKEYWORDS,UpperCase('BEGIN')) then
   ....

You can easily modify the IsKeyWord() function to return the index of the token if you need it.

function KeyWordIndex(const KeyWords: array of string; const aToken: String): integer;
// aToken must be already uppercase
var First, Last, Compare: Integer;
begin
  First := Low(Keywords);
  Last := High(Keywords);
  while First <= Last do
  begin
    result := (First + Last) shr 1;
    Compare := CompareStr(Keywords[result],aToken);
    if Compare = 0 then
      exit else
    if Compare < 0  then
      First := result + 1 else
      Last := result - 1;
  end;
  result := -1; // not found
end;
Arnaud Bouchez
+1 I've used a remarkably similar method for many generations of Delphi/Pascal without any issues. Also good use of SHR to avoid the expensive DIV for the binary search.
skamradt