views:

203

answers:

2

Problem

If the text in TRichEdit is something like this;

'hello, world'#$D#$A

Then the following routine displays TRUE. However when the RichEdit has

'test'#$D#$A#$D#$A'test'#$D#$A#$D#$A'test'#$D#$A

Then the routine displays FALSE. It seems to me to be flawed as it is finding the comma's but not the newlines/linefeeds. I created a workaround to walk the string instead and find what I'm looking for but am still really curious why the Delphi function doesn't work. Any ideas?

procedure TForm1.Button1Click(Sender: TObject);
var
   sTmp : String;
begin
   sTmp := RichEdit1.Lines.GetText;
   if ( ( Pos( ',', sTmp )  <> 0 ) or
        ( Pos( '"', sTmp )  <> 0 ) or
        ( Pos( '\n', sTmp ) <> 0 ) or
        ( Pos( '\r', sTmp ) <> 0 ) ) then
      Label1.Caption := 'TRUE'
   else
      Label1.Caption := 'FALSE';
end;

Workaround - Andreas' Version (Faster Depending on Input)

function CheckChars( const sData: String ): Boolean;
var
   pCur : PChar;
begin
   pCur := PChar( sData );

   // Exit at NULL terminator
   while ( pCur^ <> #0 ) do
   begin
      case pCur^ of
         #13, #10, #34, #44 : Exit(true);
      end;
      Inc( pCur );
   end;
end;

Correct Usage

function CheckChars( const sData: String ): Boolean
begin
   if ( ( Pos( #44, sData ) <> 0 ) or
        ( Pos( #34, sData ) <> 0 ) or
        ( Pos( #13, sData ) <> 0 ) or
        ( Pos( #10, sData ) <> 0 ) ) then
      Result := true
   else
      Result := false;
end;

Works for all characters tested, I decided not to mix quoted chars and decimal chars for readability. The only question now is which is quicker? I think my workaround would be quicker since I'm checking each char against all the ones I'm looking for, whereas when I use the System.Pos function I am running the same parsing routine 4 times.

Solution

After some testing, it depends on what kind of characters you are looking for. Testing this with a comma(#44), located 294k characters into a 589k length string. The function using System.Pos has a performance of ~390 microseconds, and the case statement runs ~700 microseconds.

HOWEVER!

If you change the character in the string to a Linefeed(#10) then it takes much much longer for the System.Pos(~2700 microseconds) due to the repeated calls. The case statement still runs ~700 microseconds.

So I guess if your looking for a particular character then System.Pos is definitely the way to go, however if you are looking for multiple(which my app does) then a repeated call isn't necessary when you could just scan it and use the case statement.

+5  A: 

I don't think Delphi recognises \n as a new line, Pos thinks you are actually searching for the characters "\" and "n".

Try searching for #13 and #10 (Carriage Return and Line Feed) instead (Alternatively you could use #$D and #$A which would be the hex equivilent.)

e.g.

if ( ( Pos( ',', sTmp )  <> 0 ) or
     ( Pos( '"', sTmp )  <> 0 ) or
     ( Pos( #10, sTmp )  <> 0 ) or
     ( Pos( #13, sTmp )  <> 0 ) ) then

Also Delphi Strings are counted and while they always end in #0 There is no guarantee that the string doesn't contain a null character, meaning that your while loop may terminate early.

so alternatively you could loop through for i := 1 to Length(sTmp) (Starting at 1 as sTmp[0] is the counter).

or you could construct your while loop as

Index := 1;

While Index < Length(sTmp) do
begin
    case sTmp[Index] of
    etc...
JamesB
I think delphi strings tend to be both null terminated, /and/ counted with the count "character" in position zero.
Arafangion
Delphi strings are both counted and null-terminated. See http://docwiki.embarcadero.com/RADStudio/en/Internal_Data_Formats#Long_String_Types
Andreas Rejbrand
Ah, you learn something new everyday....
JamesB
@Andreas: they are not null-terminated. You can type-cast it as a null-terminated character pointer (for instance to interface to environments that do not understand Delphi strings), but that character pointer might point to something which contains less characters than the Delphi string.
Jeroen Pluimers
@Jeroen Pluimers: After the string data, there is a null character. That is what I would call "null-terminated". See http://docwiki.embarcadero.com/RADStudio/en/Internal_Data_Formats#Long_String_Types
Andreas Rejbrand
@Jeroen Pluimers: Well, OK, a Delphi string can contain a #0 character, so it is not entirely right to say that Delphi strings are null-terminated in the sense that the #0 character marks the end of the string. But still, every Delphi string *do* end with a #0 character, even if the string itself doesn't contain this odd character.
Andreas Rejbrand
@Jeroen. Andreas is right, they are: "The NULL character at the end of a string memory block is automatically maintained by the compiler and the built-in string handling routines. This makes it possible to typecast a string directly to a null-terminated string."
François
@Andres: I should have phrased this better: Delphi strings are both null terminated and can contain embedded nulls. So if you cast it to a null-terminated string, and it contains an embedded null, the null-terminated string is shorter than the Delphi string.
Jeroen Pluimers
+2  A: 

(This is actually a comment, but it would look horrible as one.)

Please notice that your entire block

case pCur^ of
 #13 :   // CR
    begin
       Result := true;
       break;
    end;
 #10 :   // LF
    begin
       Result := true;
       break;
   end;
  #34 :   // Quote 
   begin
        Result := true;
       break;
    end;
 #44 :   // Comma
    begin
       Result := true;
       break;
    end;
 end;

can be written more compactly by noting that

  • Result := true; break; in this case amounts to the same thing as Result := true; Exit; which always can be written Exit(true).

  • Several cases can be combined to a single case, if the actions are identical.

Hence you can write

case pCur^ of
  #13, #10, #34, #44: Exit(true);
end;

But even better, the entire function can be written

function CheckChars(const Str: string): boolean;
const
  INTERESTING_CHARS = [#13, #10, #34, #44];
var
  i: integer;
begin
  result := false;
  for i := 1 to length(Str) do
    if Str[i] in INTERESTING_CHARS then
      Exit(true);
end;
Andreas Rejbrand
`Str[i] in INTERESTING_CHARS` will give a warning or hint (can't remember which) as the OP is using D2009. Use CharInSet function to avoid that.
Marjan Venema
Do these simplifications give any performance enhancement? If not what would be the purpose of doing it in such a way? Will the compiler not optimize the same way?
wfoster
@Marjan: That is very true. (It is a warning.)
Andreas Rejbrand
@wfoster: No, not really, it just looks cleaner.
Andreas Rejbrand
@Andreas: Cool, thanks for the help!
wfoster
@Marjan: It is a warning that can be ignored in this case because none of the chars in the set are greater than #127. The compiler generates the correct code for it.
Andreas Hausladen
@Andreas Hausladen: I am aware of that. However, I consider it bad practice to ignore warnings/hints and require all code to compile cleanly: no warnings, no hints. That way I ensure that I do not have to re-consider them at every compile and that the important ones don't get buried in amongs the "sort-of-ok" ones. If I _really_ have no other option than to code something that generates a warning or hint, that code will be surrounded by hint/warning off and hint/warning on conditionals.
Marjan Venema
@Marjan: That's why I disable that specific warning in the project options (and in the default project options). I want all my projects to compile without any hint or warning, too.
Andreas Hausladen
@Andreas Hausladen: yes, that would be another way to do it. The reason I do it on specific code constructs is because I tend to "forget" when it is save to ignore this warning and when not. So I like the compiler slap on the hand when I inadvertently use "in" instead of CharInSet (old habits die hard). I guess I am a sucker for any compiler help I can get...
Marjan Venema