ansaurus

Question

Any build-in Delphi function like PosEx that finds a sub-string starting from the back of the string?

Answer 1

+6 A:

You can use Pos in combination with ReverseString (from StrUtils)

DR 2009-10-10 14:32:32

Thanks for your reply, using your hints I've created a function that is surprisingly fast compared with the other alternatives so far.

smartins 2009-10-10 16:42:09

Answer 2

+1 A:

Not in the standard RTL but in INDY (unit idGlobalProtocols according to the online help), which is part of recent Delphi installations:

function RPos(
    const ASub: String, 
    const AIn: String, 
    AStart: Integer = -1
): Integer;

dummzeuch 2009-10-10 15:47:26

Thanks for the reply. This function is way way slower than the one I came up with, it took 5913ms (RPos) against 234ms to complete.

smartins 2009-10-10 16:29:39

Answer 3

+3 A:

Delphi comes with a function that can search backward, SearchBuf in the StrUtils unit. It's specialized for searching for words, though, so it might not behave quite the way you want. Below I've wrapped it into a function matching your desired interface.

function FastPosBack(const aSourceString, aFindString: AnsiString;
                     const aSourceLen, aFindLen, StartPos: Integer): Integer;
var
  Source, Match: PAnsiChar;
begin
  Source := PAnsiChar(ASourceString);
  Match := SearchBuf(Source, ASourceLen, ASourceLen, 0,
                     AFindString, [soMatchCase]);
  if Assigned(Match) then
    Result := Match - Source + 1
  else
    Result := 0;
end;

Rob Kennedy 2009-10-10 15:57:25

Thanks for the reply Rob. Unfortunately this function is even slower than the one I can me up, taking 1310ms to complete. I've changed the AnsiString and PAnsiChar to String and PChar respectively because that's what I'm trying to accomplish, a decent replacement to these ultra-fast AnsiString functions.

smartins 2009-10-10 16:20:37

Answer 4

+2 A:

First, consider if a speed optimized solution is necessary. If its not likely that it will be called 100000 times in real use reversing the strings and using the existing substring search is fine.

If speed is an issue, there are plenty of good resources for writing you own. Look on wikipedia for "string search algorithms" for ideas. I'll post a link and an example algorithm when I'm at a computer. I'm typing this from my phone at the moment.

Update:

Here's the example I promised:

function RPOS(pattern: string; text:string): Integer;
var patternPosition,
    textPosition: Integer;
begin
  Result := -1;
  for textPosition := Length(text) downto 0 do
  begin
    for patternPosition := Length(pattern) downto 0 do
      if not (pattern[patternPosition] = (text[textPosition - (Length(pattern) - patternPosition)])) then
        break;
    if patternPosition = 0 then
      Result := textPosition -Length(pattern) + 1;
  end;
end;

Its basically an inverted naive(brute force) string search algorithm. It starts at the end of both the pattern and text and works its way to the beginning. I can guarantee it is less efficient than Delphi's Pos() function though I can't say whether its faster or slower than the Pos()-ReverseString() combination as I haven't tested it. There is an error in it which I haven't found the cause of. If the two strings are identical its returning -1 (not found).

codeelegance 2009-10-10 16:04:59

You are probably right, these differences won't be noticeable unless doing a huge number of iterations.

smartins 2009-10-10 16:21:56

Small differences soon add up, and in this case the routine required is so small that it makes sense to optimise since the effort to do so is negligible and means that you then don't have to worry about using it in performance critical code if the need arises in the future. Premature optimisation is a mistake when the effort to optimise is disproportionate to the gains. In this case the effort is minimal and so is justified for *any* gain if it can be achieved without sacrificing ease of use, imho.

Deltics 2009-10-10 22:46:24

I disagree. I think simplicity wins over speed even in trivial cases. Anytime you optimize you increase the complexity of the code. The only time I optimize is when the existing solution has proven to be inadequate in a test or production environment.

codeelegance 2009-10-11 01:57:56

Kenneth thanks for the function. I tested it and comes up at 453ms against 344ms of my own. Deltics function is 300% faster than mine at 109ms.

smartins 2009-10-11 19:38:04

I expected as much. The naive string search is known for its simplicity not its speed. In fact, unless you add unnecessary operations to it its probably the least efficient search algorithm there is. I used it because it only took about 5 minutes to write.

codeelegance 2009-10-11 20:44:05

Kenneth, generally I'd agree with you, but in this case the "optimization" does not result in over complication. If anything my solution, of all those offered, most directly implements the problem as stated (look for a substring starting at the end of some other string). Others involved juggling the inputs and re-expressing the problem differently before again juggling the outputs. That the most direct approach also turned out to be the most efficient was a bonus. I'd also say that users of poorly performing software generally aren't impressed by how much simpler the source code is. :)

Deltics 2009-10-12 00:09:16

@Kenneth: On my own experience, not anytime you optimize the complexity of the code is increased. Sometimes, less complex code performs better than previous. The main factor is how bad the original programmer was.

jachguate 2010-01-22 03:35:31

Answer 5

+3 A:

Try this/these:

function RPos(const aSubStr, aString : String; const aStartPos: Integer): Integer; overload;
var
  i: Integer;
  pStr: PChar;
  pSub: PChar;
begin
  pSub := Pointer(aSubStr);

  for i := aStartPos downto 1 do
  begin
    pStr := @(aString[i]);
    if (pStr^ = pSub^) then
    begin
      if CompareMem(pSub, pStr, Length(aSubStr)) then
      begin
        result := i;
        EXIT;
      end;
    end;
  end;

  result := 0;
end;


function RPos(const aSubStr, aString : String): Integer; overload;
begin
  result := RPos(aSubStr, aString, Length(aString) - Length(aSubStr) + 1);
end;

The overload provides a way to call RPos using the most efficient startpos for searching from the very end of the string without having to calculate that yourself. For efficiency no checking is performed on startpos when explicitly specified.

In my SmokeTest performance testing suite this comes out about 20% faster than your FastPosBack (which incidentally contains an "off by one" error as well as requiring some parameters which it doesn't actually use).

Deltics 2009-10-10 22:44:01

Hi. Thanks for your function, it is indeed much faster than the one I can up with (344ms vs 109ms). The extra parameters on the function where meant so that I don't need to change any of the FastStrings functions calls and just replace them with my own functions.

smartins 2009-10-11 19:29:51

Deltics, is your function Unicode compatible? I noticed that Marco's functions parameters are AnsiStrings and I'll be using this code in D2010.

smartins 2009-10-11 19:35:42

It was developed and tested using Delphi 2009, so yes I believe it is Unicode "safe". However, it *does* assume that String and SubStr contain the same 'payload' types (i.e. both string and substr are UTF16 or ANSI etc).

Deltics 2009-10-11 19:52:09

I just tested it with some unicode specific characters and indeed it works just fine. I'm going to accept your answer since it provides a comparable performance to the regular Pos which is more than enough for me. Thanks again for all the replies. You guys are the best!

smartins 2009-10-11 20:32:24

Deltics, do you have a similar function that is not case-sensitive? I'm going to open a new question for this specific question but perhaps you have something already.

smartins 2009-10-11 21:21:02

I'm afraid not - this wasn't a routine I had "lying around". The question appealed to my sense of curiosity (and I like finding efficient solutions to such problems), so I created the routine to see what sort of results I got - they were good enough that I thought it was worth sharing. :) Removing case sensitivity makes the problem significantly harder, especially if looking for a single solution that is portable across ANSI and Unicode strings, but you've gone and appealed to that part of me again, so I'll spend a few minutes on it tonight and see how far I get.

Deltics 2009-10-12 00:01:18

Answer 6

+1 A:

Marco van de Voort 2009-10-11 17:28:44

Marco, I noticed that these functions all take AnsiStrings so I take it that they are not Unicode compatible in D2009/D2010. Do you know if there are any plans on making these Unicode compatible?

smartins 2009-10-11 19:40:04

FPC is working on unicode compiler modes. It will probably follow the same model as with Turbo Pascal->Delphi, namely that the type of the default string can be selected per unit. But base language support will take a while, and for the libraries to catch up will take longer.surrogates make the unicode case considerably more difficult, since most document surrogate scanning techniques work forward, not backwards.

Marco van de Voort 2009-10-11 19:54:10

I wouldn't expect anything production ready in the next year, year and an half.

Marco van de Voort 2009-10-11 19:55:04

ansaurus

tags:

views:

answers:

Any build-in Delphi function like PosEx that finds a sub-string starting from the back of the string?

related questions