views:

251

answers:

2

I recently found a contest problem that asks you to compute the minimum number of characters that must be inserted (anywhere) in a string to turn it into a palindrome.

For example, given the string: "abcbd" we can turn it into a palindrome by inserting just two characters: one after "a" and another after "d": "adbcbda".

This seems to be a generalization of a similar problem that asks for the same thing, except characters can only be added at the end - this has a pretty simple solution in O(N) using hash tables.

I have been trying to modify the Levenshtein distance algorithm to solve this problem, but haven't been successful. Any help on how to solve this (it doesn't necessarily have to be efficient, I'm just interested in any DP solution) would be appreciated.

+1  A: 

Seems like you could solve this simply by working from the ends inward - if the two outer characters don't match, then you'll need to add a character next to one of those that matches the other; then move your focus onto the next pair of non-matching characters, until you've worked your way to the middle of the string and your area of examination contains no characters.

Seems like a fairly straightforward O(n) algorithm to me, since you always move one of your "examination window" boundaries inwards by at least one step with every iteration, and thus you're guaranteed to finish by a maximum of n iterations.

There are actually multiple solutions for your example string,

abcbd -> adbcbda, or abcbd -> dabcbad

both of which are edit distance 2.

The former would come if you decided to add an a to balance out the initial a<>d inequality first; the latter would come if you decided to add a d first. To make sure the solution is optimal, however, you'd probably need to examine the "next letter in" past the two you're currently comparing; if one of the "next letters in" can provide a match for half of your comparison, then you favor adding a match for the other to allow the existing match to remove more characters.

Amber
How would you know on which end to add the extra character if both don't have an easy match? Example: asefchbbbbbbbbbfha. After the a, should I put an h before the first s or an s after the last h?
Mark Byers
Added a small bit - "To make sure the solution is optimal, however, you'd probably need to examine the "next letter in" past the two you're currently comparing; if one of the "next letters in" can provide a match for half of your comparison, then you favor adding a match for the other to allow the existing match to remove more characters."
Amber
Are you sure that's O(N)? You say "next to one of those", but you need to work out *which* one (it can be either or), so I think it's more likely O(n^2) - though please forgive me if I've got my big-O stuff wrong - it's been a while!
Dominic Rodger
As far as I can acertain, Dominic, besides the small addendum above I don't believe it matters which you pick, thus it remains O(n) because you don't have to explore both paths.
Amber
(also, if it were necessary to explore both paths for every iteration, then it'd be something more like O(2^n) since the paths would branch)
Amber
Are you sure checking just the next letter in provides an optimal solution all the time? I haven't tried to find a counterexample, but what you're suggesting sounds like a greedy algorithm. Can you prove that looking just one letter ahead is always enough?
IVlad
(@Dav - I was basing my O(n^2) on the hypothesis that you'd have to check every combination of selections for each letter - I'm not sure it's the case that it's always going to be one, or always the other.)
Dominic Rodger
This algorithm looks basically correct to me. If x1x2...xn is the string, we consider three possibilities. x1 = xn, just delete those and work on x2...x(n-1). Two possibilities now, insert x1 and end or insert xn at beginning. If we maintain a table, this is at worst O(n^2). The equations are: MinInsert(x1x2...xn) = MinInsert(x2x3...x(n-1)) if x1 = xn. MinInsert(x1x2...xn) = 1 + Min( MinInsert(x2...xn), MinInsert(x1x2...x(n-1)).
Moron
@Moron: that seems to work on all the tests I could manually verify. Thank you.
IVlad
+2  A: 

Note: This is just a curiosity. Dav proposed an algorithm which can be modified to DP algorithm to run in O(n^2) time and O(n^2) space easily (and perhaps O(n) with better bookkeeping).

Of course, this 'naive' algorithm might actually come in handy if you decide to change the allowed operations.


Here is a 'naive'ish algorithm, which can probably be made faster with clever bookkeeping.

Given a string, we guess the middle of the resulting palindrome and then try to compute the number of inserts required to make the string a palindrome around that middle.

If the string is of length n, there are 2n+1 possible middles (Each character, between two characters, just before and just after the string).

Suppose we consider a middle which gives us two strings L and R (one to left and one to right).

If we are using inserts, I believe the Longest Common Subsequence algorithm (which is a DP algorithm) can now be used the create a 'super' string which contains both L and reverse of R, see Shortest common supersequence.

Pick the middle which gives you the smallest number inserts.

This is O(n^3) I believe. (Note: I haven't tried proving that it is true).

Moron
How do you use the SCS to find out how many insertions are needed if you center the palindrome around a character k for example?For my example: "abcbd". Suppose we center at the second "b". SCS between "abc" and "d" is "abcd". How does this tell you how many characters you need to insert? I am thinking it's 2*strlen(SCS) - strlen(L) - strlen(R), is that correct?
IVlad
Yes, that looks right.
Moron