views:

1212

answers:

6

Is there any Delphi D2010 function like PosEx that finds a sub-string inside a string starting from the end of the string?

I'm removing all the calls to the FastStrings library and one of the functions I was using was FastPosBack:

function FastPosBack(const aSourceString, aFindString : AnsiString; const aSourceLen, aFindLen, StartPos : Integer) : Integer;

I found LastDelimiter but it's not quite the same thing since it only finds the last delimiter and I can't specify a start position.

Thanks!

Update: Following DR comment, I've created this function:

function FastPosBack(const aSourceString, aFindString : String; const aSourceLen, aFindLen, StartPos : Integer) : Integer;
var
  RevSourceString, RevFindString: string;
begin
  RevSourceString := AnsiReverseString(aSourceString);
  RevFindString := AnsiReverseString(aFindString);

  Result := Length(aSourceString) - PosEx(RevFindString, RevSourceString, StartPos) + 1;
end;

Is there any more effective way of doing this? On a 1000000 loop cycle, Pos takes 47ms while FastPosBack takes 234ms to complete.

+6  A: 

You can use Pos in combination with ReverseString (from StrUtils)

DR
Thanks for your reply, using your hints I've created a function that is surprisingly fast compared with the other alternatives so far.
smartins
+1  A: 

Not in the standard RTL but in INDY (unit idGlobalProtocols according to the online help), which is part of recent Delphi installations:

function RPos(
    const ASub: String, 
    const AIn: String, 
    AStart: Integer = -1
): Integer;
dummzeuch
Thanks for the reply. This function is way way slower than the one I came up with, it took 5913ms (RPos) against 234ms to complete.
smartins
+3  A: 

Delphi comes with a function that can search backward, SearchBuf in the StrUtils unit. It's specialized for searching for words, though, so it might not behave quite the way you want. Below I've wrapped it into a function matching your desired interface.

function FastPosBack(const aSourceString, aFindString: AnsiString;
                     const aSourceLen, aFindLen, StartPos: Integer): Integer;
var
  Source, Match: PAnsiChar;
begin
  Source := PAnsiChar(ASourceString);
  Match := SearchBuf(Source, ASourceLen, ASourceLen, 0,
                     AFindString, [soMatchCase]);
  if Assigned(Match) then
    Result := Match - Source + 1
  else
    Result := 0;
end;
Rob Kennedy
Thanks for the reply Rob. Unfortunately this function is even slower than the one I can me up, taking 1310ms to complete. I've changed the AnsiString and PAnsiChar to String and PChar respectively because that's what I'm trying to accomplish, a decent replacement to these ultra-fast AnsiString functions.
smartins
+2  A: 

First, consider if a speed optimized solution is necessary. If its not likely that it will be called 100000 times in real use reversing the strings and using the existing substring search is fine.

If speed is an issue, there are plenty of good resources for writing you own. Look on wikipedia for "string search algorithms" for ideas. I'll post a link and an example algorithm when I'm at a computer. I'm typing this from my phone at the moment.

Update:

Here's the example I promised:

function RPOS(pattern: string; text:string): Integer;
var patternPosition,
    textPosition: Integer;
begin
  Result := -1;
  for textPosition := Length(text) downto 0 do
  begin
    for patternPosition := Length(pattern) downto 0 do
      if not (pattern[patternPosition] = (text[textPosition - (Length(pattern) - patternPosition)])) then
        break;
    if patternPosition = 0 then
      Result := textPosition -Length(pattern) + 1;
  end;
end;

Its basically an inverted naive(brute force) string search algorithm. It starts at the end of both the pattern and text and works its way to the beginning. I can guarantee it is less efficient than Delphi's Pos() function though I can't say whether its faster or slower than the Pos()-ReverseString() combination as I haven't tested it. There is an error in it which I haven't found the cause of. If the two strings are identical its returning -1 (not found).

codeelegance
You are probably right, these differences won't be noticeable unless doing a huge number of iterations.
smartins
Small differences soon add up, and in this case the routine required is so small that it makes sense to optimise since the effort to do so is negligible and means that you then don't have to worry about using it in performance critical code if the need arises in the future. Premature optimisation is a mistake when the effort to optimise is disproportionate to the gains. In this case the effort is minimal and so is justified for *any* gain if it can be achieved without sacrificing ease of use, imho.
Deltics
I disagree. I think simplicity wins over speed even in trivial cases. Anytime you optimize you increase the complexity of the code. The only time I optimize is when the existing solution has proven to be inadequate in a test or production environment.
codeelegance
Kenneth thanks for the function. I tested it and comes up at 453ms against 344ms of my own. Deltics function is 300% faster than mine at 109ms.
smartins
I expected as much. The naive string search is known for its simplicity not its speed. In fact, unless you add unnecessary operations to it its probably the least efficient search algorithm there is. I used it because it only took about 5 minutes to write.
codeelegance
Kenneth, generally I'd agree with you, but in this case the "optimization" does not result in over complication. If anything my solution, of all those offered, most directly implements the problem as stated (look for a substring starting at the end of some other string). Others involved juggling the inputs and re-expressing the problem differently before again juggling the outputs. That the most direct approach also turned out to be the most efficient was a bonus. I'd also say that users of poorly performing software generally aren't impressed by how much simpler the source code is. :)
Deltics
@Kenneth: On my own experience, not anytime you optimize the complexity of the code is increased. Sometimes, less complex code performs better than previous. The main factor is how bad the original programmer was.
jachguate
+3  A: 

Try this/these:

function RPos(const aSubStr, aString : String; const aStartPos: Integer): Integer; overload;
var
  i: Integer;
  pStr: PChar;
  pSub: PChar;
begin
  pSub := Pointer(aSubStr);

  for i := aStartPos downto 1 do
  begin
    pStr := @(aString[i]);
    if (pStr^ = pSub^) then
    begin
      if CompareMem(pSub, pStr, Length(aSubStr)) then
      begin
        result := i;
        EXIT;
      end;
    end;
  end;

  result := 0;
end;


function RPos(const aSubStr, aString : String): Integer; overload;
begin
  result := RPos(aSubStr, aString, Length(aString) - Length(aSubStr) + 1);
end;

The overload provides a way to call RPos using the most efficient startpos for searching from the very end of the string without having to calculate that yourself. For efficiency no checking is performed on startpos when explicitly specified.

In my SmokeTest performance testing suite this comes out about 20% faster than your FastPosBack (which incidentally contains an "off by one" error as well as requiring some parameters which it doesn't actually use).

Deltics
Hi. Thanks for your function, it is indeed much faster than the one I can up with (344ms vs 109ms). The extra parameters on the function where meant so that I don't need to change any of the FastStrings functions calls and just replace them with my own functions.
smartins
Deltics, is your function Unicode compatible? I noticed that Marco's functions parameters are AnsiStrings and I'll be using this code in D2010.
smartins
It was developed and tested using Delphi 2009, so yes I believe it is Unicode "safe". However, it *does* assume that String and SubStr contain the same 'payload' types (i.e. both string and substr are UTF16 or ANSI etc).
Deltics
I just tested it with some unicode specific characters and indeed it works just fine. I'm going to accept your answer since it provides a comparable performance to the regular Pos which is more than enough for me. Thanks again for all the replies. You guys are the best!
smartins
Deltics, do you have a similar function that is not case-sensitive? I'm going to open a new question for this specific question but perhaps you have something already.
smartins
I'm afraid not - this wasn't a routine I had "lying around". The question appealed to my sense of curiosity (and I like finding efficient solutions to such problems), so I created the routine to see what sort of results I got - they were good enough that I thought it was worth sharing. :) Removing case sensitivity makes the problem significantly harder, especially if looking for a single solution that is portable across ANSI and Unicode strings, but you've gone and appealed to that part of me again, so I'll spend a few minutes on it tonight and see how far I get.
Deltics
+1  A: 
Marco van de Voort
Marco, I noticed that these functions all take AnsiStrings so I take it that they are not Unicode compatible in D2009/D2010. Do you know if there are any plans on making these Unicode compatible?
smartins
FPC is working on unicode compiler modes. It will probably follow the same model as with Turbo Pascal->Delphi, namely that the type of the default string can be selected per unit. But base language support will take a while, and for the libraries to catch up will take longer.surrogates make the unicode case considerably more difficult, since most document surrogate scanning techniques work forward, not backwards.
Marco van de Voort
I wouldn't expect anything production ready in the next year, year and an half.
Marco van de Voort