ansaurus

Question

How to find a common suffix in two strings?

Answer 1

+1 A:

Your approach seems OK. You could iterate over all strings, not just two at a time which would save some reverses (and a lot of time if, say, the last string had no common prefix but all the others did - you'd do a lot of work for nothing with the pair wise approach)

There's also no need to reverse the current candidate common suffix until you've completed all the comparisons.

However, you could avoid the reverses by keeping an array of indexes to where you are for each string, initialising each to the length of the string (minus 1) and work backwards from the end, iterating over all the strings.

Paul 2010-10-11 12:47:39

Answer 2

+3 A:

Well, to start with you don't need to convert the strings into char arrays. You can use indexers into the strings to fetch individual characters.

It's probably worth thinking of it as a number rather than a string... each pairwise comparison will give you a maximal value, and the final number (the size of the suffix) is the minimum of those maxima.

So two approaches suggest themselves:

Start with 0 (always valid) and work your way up: check whether 1 is valid (i.e. all strings end with the same character) then move to 2 (by checking the penultimate character) etc
Start with infinity, then do pairwise comparisons to reduce the maximum length. Of course you don't need to do all pairwise comparisons - just a comparison of each string with the first should be fine.

Personally I'd probably go with the first approach though - it won't have as good cache coherency, but I think it'll be better in some situations (e.g. many strings, all but one of which have a long common suffix.

(Of course, once you've got the length, getting the actual substring is very simple.)

Jon Skeet 2010-10-11 12:48:49

The double reverse was something I spotted myself as I wrote it. A quick bit of profiling shows that it is also about as fast as always return the first element from the input array (given the real data), so I may be fussing over nothing...

Rowland Shaw 2010-10-11 13:51:48

Depending on the scale of the problem, and whether performance is an important requirement, it may lend itself to being solved using [Suffix Trees](http://en.wikipedia.org/wiki/Suffix_tree). STs are specifically designed to make it easy to answer question like what is the longest common shared substring or suffix, and can do so at a cost proportional to the longest string. Here's a [decent implementation](http://code.google.com/p/csharsuffixtree/source/browse/#svn/trunk/suffixtree) in C#. Of course, if the number of strings is small and vary as the input, then STs may not be the best choice.

LBushkin 2010-10-11 21:23:03

Answer 3

A:

this might be a good candidate for Memoized recursive functions given that you may be wanting to hold onto previously calculated values.

basic example: http://weblogs.asp.net/podwysocki/archive/2008/08/01/recursing-into-recursion-memoization.aspx

or: http://explodingcoder.com/blog/content/painless-caching-memoization-net

might fit, might be useless :)

jim 2010-10-11 12:55:18

ansaurus

tags:

views:

answers:

How to find a common suffix in two strings?

related questions