views:

1571

answers:

6

I'm posting this on behalf of a friend since I believe this is pretty interesting:

Take the string "abb". By leaving out any number of letters less than the length of the string we end up with 7 strings.

a b b ab ab bb abb

Out of these 4 are palindromes.

Similarly for the string

"hihellolookhavealookatthispalindromexxqwertyuiopasdfghjklzxcvbnmmnbvcxzlkjhgfdsapoiuytrewqxxsoundsfamiliardoesit"

(a length 112 string) 2^112 - 1 strings can be formed.

Out of these how many are palindromes??

Below there is his implementation (in C++, C is fine too though). It's pretty slow with very long words; he wants to know what's the fastest algorithm possible for this (and I'm curious too :D).

#include <iostream>
#include <cstring>

using namespace std;



void find_palindrome(const char* str, const char* max, long& count)
{
    for(const char* begin = str; begin < max; begin++) {
        count++;
        const char* end = strchr(begin + 1, *begin);
        while(end != NULL) {
            count++;
            find_palindrome(begin + 1, end, count);
            end = strchr(end + 1, *begin);
        }
    }
}


int main(int argc, char *argv[])
{
    const char* s = "hihellolookhavealookatthis";
    long count = 0;

    find_palindrome(s, strlen(s) + s, count);

    cout << count << endl;
}
A: 

Hmmmmm, I think I would count up like this:

Each character is a palindrome on it's own (minus repeated characters).
Each pair of the same character.
Each pair of the same character, with all palindromes sandwiched in the middle that can be made from the string between repeats.
Apply recursively.

Which seems to be what you're doing, although I'm not sure you don't double-count the edge cases with repeated characters.

So, basically, I can't think of a better way.

EDIT:
Thinking some more, It can be improved with caching, because you sometimes count the palindromes in the same sub-string more than once. So, I suppose this demonstrates that there is definitely a better way.

Autopulated
But strchr() is expensive. Working from smaller palindromes outward, it's necessary to compare only the first and last characters, stopping at the first mismatch.
Adam Liss
I don't know: O(n) isn't the most expensive operation in the world. I admit there is probably a better solution, but I'm not sure it comes from either a top-down or bottom up approach.
Autopulated
+1  A: 

Is there any mileage in making an initial traversal and building an index of all occurances of each character.

 h = { 0, 2, 27}
 i = { 1, 30 }
 etc.

Now working from the left, h, only possible palidromes are at 3 and 17, does char[0 + 1] == char [3 -1] etc. got a palindrome. does char [0+1] == char [27 -1] no, No further analysis of char[0] needed.

Move on to char[1], only need to example char[30 -1] and inwards.

Then can probably get smart, when you've identified a palindrome running from position x->y, all inner subsets are known palindromes, hence we've dealt with some items, can eliminate those cases from later examination.

djna
A: 

I am not sure but you might try whit fourier. This problem remined me on this: http://stackoverflow.com/questions/1560523/onlogn-algorithm-find-three-evenly-spaced-ones-within-binary-string

Just my 2cents

ralu
+6  A: 

First of all, your friend's solution seems to have a bug since strchr can search past max. Even if you fix this, the solution is exponential in time.

For a faster solution, you can use dynamic programming to solve this in O(n^3) time. This will require O(n^2) additional memory. Note that for long strings, even 64-bit ints as I have used here will not be enough to hold the solution.

#define MAX_SIZE 1000
long long numFound[MAX_SIZE][MAX_SIZE]; //intermediate results, indexed by [startPosition][endPosition]

long long countPalindromes(const char *str) {
    int len = strlen(str);
    for (int startPos=0; startPos<=len; startPos++)
        for (int endPos=0; endPos<=len; endPos++)
            numFound[startPos][endPos] = 0;

    for (int spanSize=1; spanSize<=len; spanSize++) {
        for (int startPos=0; startPos<=len-spanSize; startPos++) {
            int endPos = startPos + spanSize;
            long long count = numFound[startPos+1][endPos];   //if str[startPos] is not in the palindrome, this will be the count
            char ch = str[startPos];

            //if str[startPos] is in the palindrome, choose a matching character for the palindrome end
            for (int searchPos=startPos; searchPos<endPos; searchPos++) {
                if (str[searchPos] == ch)
                    count += 1 + numFound[startPos+1][searchPos];
            }

            numFound[startPos][endPos] = count;
        }
    }
    return numFound[0][len];
}

Explanation:

The array numFound[startPos][endPos] will hold the number of palindromes contained in the substring with indexes startPos to endPos.

We go over all pairs of indexes (startPos, endPos), starting from short spans and moving to longer ones. For each such pair, there are two options:

  1. The character at str[startPos] is not in the palindrome. In that case, there are numFound[startPos+1][endPos] possible palindromes - a number that we have calculated already.

  2. character at str[startPos] is in the palindrome (at its beginning). We scan through the string to find a matching character to put at the end of the palindrome. For each such character, we use the already-calculated results in numFound to find number of possibilities for the inner palindrome.

EDIT:

  • Clarification: when I say "number of palindromes contained in a string", this includes non-contiguous substrings. For example, the palindrome "aba" is contained in "abca".

  • It's possible to reduce memory usage to O(n) by taking advantage of the fact that calculation of numFound[startPos][x] only requires knowledge of numFound[startPos+1][y] for all y. I won't do this here since it complicates the code a bit.

  • Pregenerating lists of indices containing each letter can make the inner loop faster, but it will still be O(n^3) overall.

interjay
@Pavel Shved: Can you clarify what you mean? My answer gives the same results as the original code, once the bug I mentioned is fixed.
interjay
I am afraid you are slightly wrong in your use of subindexes. Personally I would say that "leaving out any number of letters" means that from "abc" one would derive "a b c ab ac bc abc". The actual example is a bit shaky on this (using "abb") but you will notice that in the derived list "ab" appears twice whereas if the strings were contiguous you would only derive "a b b ab bb abb".
Matthieu M.
+1 For suggesting dynamic programming anyway.
Matthieu M.
@Matthieu M.: My answer does not look only at contiguous substrings - it does exactly what the question asks. For example, using the string "aaa" will give a result of 7 and not 6 as it would if it only counted contiguous substrings. If you think otherwise, please provide an example string where my answer gives the wrong result.
interjay
My apology, I was misguided by the use of contiguous spans.
Matthieu M.
+1  A: 

My solution using O(n) memory and O(n^2) time, where n is the string length:

palindrome.c:

#include <stdio.h>
#include <string.h>

typedef unsigned long long ull;

ull countPalindromesHelper (const char* str, const size_t len, const size_t begin, const size_t end, const ull count) {
  if (begin <= 0 || end >= len) {
    return count;
  }
  const char pred = str [begin - 1];
  const char succ = str [end];
  if (pred == succ) {
    const ull newCount = count == 0 ? 1 : count * 2;
    return countPalindromesHelper (str, len, begin - 1, end + 1, newCount);
  }
  return count;
}

ull countPalindromes (const char* str) {
  ull count = 0;
  size_t len = strlen (str);
  size_t i;
  for (i = 0; i < len; ++i) {
    count += countPalindromesHelper (str, len, i, i, 0);  // even length palindromes
    count += countPalindromesHelper (str, len, i, i + 1, 1); // odd length palindromes
  }
  return count;
}

int main (int argc, char* argv[]) {
 if (argc < 2) {
  return 0;
 }
 const char* str = argv [1];
 ull count = countPalindromes (str);
 printf ("%llu\n", count);
 return 0;
}

Usage:

$ gcc palindrome.c -o palindrome
$ ./palindrome myteststring

EDIT: I misread the problem as the contiguous substring version of the problem. Now given that one wants to find the palindrome count for the non-contiguous version, I strongly suspect that one could just use a math equation to solve it given the number of distinct characters and their respective character counts.

trinithis
As you said this only finds contiguous substrings. I doubt you can find a math equation to go from this to the correct solution as you said: For example, the strings "abcdabcd" and "abcdabdc" both give 8 in your solution and both have the same character counts. However the correct solution is different for both (24 and 27 respectively).
interjay
Oh, I see. I mistook the meaning of non-contiguous as being completely free with respect to reordering. IOW, any subpermutation would be game.
trinithis
A: 

Here is a program for finding all the possible palindromes in a string written in both Java and C++.

Bragboy