ansaurus

Question

Answer 1

+1 A:

There isn't, unless you do a fuzzy match. Which is what they probably do in DNA (I've done EST searching in DNA with smith-waterman, but that is obviously much harder then matching for a palindrome or reverse-complement in a sequence).

nlucaroni 2008-10-29 19:53:48

Answer 2

+2 A:

Obviously, you're not going to be able to get better than O(n) asymptotic efficiency, since each character must be examined at least once. You can get better multiplicative constants, though.

For a single thread, you can get a speedup using assembly. You can also do better by examining data in chunks larger than a byte at a time, but this may be tricky due to alignment considerations. You'll do even better to use SIMD, if you can examine chunks as large as 16 bytes at a time.

If you wanted to parallelize it, you could divide the string into N pieces, and have processor i compare the segment [i*n/2, (i+1)*N/2) with the segment [L-(i+1)*N/2, L-i*N/2).

Adam Rosenfield 2008-10-29 19:54:06

Instead of comparing chunks of 16 bytes, it's probably faster to do 4 palindromes at a time. It'll save you swizzling data and probably doesn't require as much horizontal operations.

Jasper Bekkers 2008-10-29 20:04:17

Other ideas: Store as much of the key in one machine word as you can. Compare this to each byte of a memory buffer containing the test item. Do not resort to string operations until this hits. Do not use anything wider than 8-bit characters as the limiting factor is going to be memory access.

Loren Pechtel 2010-07-20 00:46:58

Answer 3

+1 A:

They are both in O(N) so I don't think there is any particular efficiency problem with any of these solutions. Maybe I am not creative enough but I can't see how would it be possible to compare N elements in less than N steps, so something like O(log N) is definitely not possible IMHO.

Pararellism might help, but it still wouldn't change the big-Oh rank of the algorithm since it is equivalent to running it on a faster machine.

DrJokepu 2008-10-29 19:54:51

Answer 4

+4 A:

Given only one palindrome, you will have to do it in O(N), yes. You can get more efficiency with multi-processors by splitting the string as you said.

Now say you want to do exact DNA matching. These strings are thousands of characters long, and they are very repetitive. This gives us the opportunity to optimize.

Say you split a 1000-char long string into 5 pairs of 100,100. The code will look like this:

isPal(w[0:100],w[-100:]) and isPail(w[101:200], w[-200:-100]) ...

etc... The first time you do these matches, you will have to process them. However, you can add all results you've done into a hashtable mapping pairs to booleans:

isPal = {("ATTAGC", "CGATTA"): True, ("ATTGCA", "CAGTAA"): False}

etc... this will take way too much memory, though. For pairs of 100,100, the hash map will have 2*4^100 elements. Say that you only store two 32-bit hashes of strings as the key, you will need something like 10^55 megabytes, which is ridiculous.

Maybe if you use smaller strings, the problem can be tractable. Then you'll have a huge hashmap, but at least palindrome for let's say 10x10 pairs will take O(1), so checking if a 1000 string is a palindrome will take 100 lookups instead of 500 compares. It's still O(N), though...

Claudiu 2008-10-29 20:03:52

You are forgetting that hash lookup is linear in the length of the key and since hash calculation uses some arithmetics it's actually less efficient than char-by-char comparison. Also chunking won't help even if you partalelize since for every miss you'll have huge amount of wasted work and there's much more misses than hits. Comparing from the center is much more efficient since you can bail out early.

ZXX 2010-08-03 06:30:37

Answer 5

A:

With Python, short code can be faster since it puts the load into the faster internals of the VM (And there is the whole cache and other such things)

def ispalin(x):
   return all(x[a]==x[-a-1] for a in xrange(len(x)>>1))

Demur Rumed 2009-01-28 02:33:17

Answer 6

+1 A:

Another variant of your second function. We need no check equals of the right parts of normal and reverse strings.

def palindrome_reverse(s):
  l = len(s) / 2
  return s[:l] == s[l::-1]

drnk 2009-04-29 08:24:34

Answer 7

+1 A:

Comparing from the center is always much more efficient since you can bail out early on a miss but it alwo allows you to do faster max palindrome search, regardless of whether you are looking for the maximal radius or all non-overlapping palindromes.

The only real paralellization is if you have multiple independent strings to process. Splitting into chunks will waste a lot of work for every miss and there's always much more misses than hits.

ZXX 2010-07-20 00:34:11

Answer 8

A:

This link shows a program to find all the possible palindromes in a simple string. May be you can follow the same strategy there. It finds whether a sequence is palindrome or not in a linear time period. I don't know how this may perform in a dynamic sequence, but its definitely ultra efficient if you know your input before hand.

Bragboy 2010-07-20 08:36:46

ansaurus

tags:

views:

answers:

Palindrome detection efficiency

related questions