ansaurus

Question

How to make this code handle big inputs more efficiently?

Answer 1

+3 A:

Try using a dynamic programming approach instead:

Create an array (call it 'P') with 1 element per character in the string.
Initialize P[0] = 1 (unless the first character is 0, in which case just return 0 for the result).
Initialize P[1] = 2 if the first two characters can be interpreted as a letter as can the current; otherwise 1 if the current character is non-zero, otherwise return 0 for the result).
Fill in the rest of the array from left to right, via the following rule (pseudocode):

P[x] = (if current character is '0' then 0, else P[x-1]) + (if previous character + current character can be interpreted as a letter then P[x-2] else 0)

(Note that if P[x] is ever 0 you should return zero, since that means there were two 0's in a row which your rules don't seem to allow.)

The first portion of the sum is to deal with the case where the current character is interpreted as a letter; the second part of the sum is to deal with the case where the 2 most recent characters are interpreted as a letter.

Essentially, P[x] will be equal to the number of ways that the entirety of the string from the start up to position x can be interpreted as letters. Since you can determine this from looking at previous results, you only need to loop through the contents of the string once - an O(N) time instead of a O(2^N) which is a huge improvement. Your final result is simply P[len(input)-1] since "everything from the start up to the end" is the same as just "the entire string".

Example run for your very basic input case of '111':

P[0] = 1 (Since 1 is non-zero)
P[1] = 2 (Since 11 is a valid letter, and 1 is also a valid letter)
P[2] = 3 (Since the most recent two characters together are a valid letter, and the current character is nonzero, so P[0]+P[1] = 1+2 = 3)

Since P[2] is our last result, and it's 3, our answer is 3.

If the string were '1111' instead, we'd continue another step:

P[3] = 5 (Since the most recent two characters are a valid letter, and current character is non-zero, so P[1]+P[2] = 2+3 = 5)

The answer is indeed 5 - valid interpretations being AAAA, KK, AKA, AAK, KAA. Notice how those 5 potential answers are built up from the potential interpretations of '11' and '111':

'11': AA or K '111': AAA or KA or AK

'111'+A: AAA+A or KA+A or AK+A '11'+K: AA+K or K+K

Amber 2009-09-04 05:34:12

Yeah, that's great solution. Thanks.

ooboo 2009-09-04 05:48:39

Nice algorithm! Note that you actually don't have to save the intermediate results in an array: keeping the last two `P` results is sufficient.

EOL 2009-09-04 11:41:48

Yep EOL. The array approach can make the concept a little easier to visualize though, and then once the basics behind the algorithm are grasped, it's fairly simple to modify to only store what's necessary.

Amber 2009-09-04 21:06:05

Answer 2

A:

A non-recursive algorithm can be written but I don't think it will be faster. I'm no python expert so I'll just give you an algorithm:

Convert the array on numbers to an array of letters using just A thru I and leaving the zeros in place. 
Create two nested loops where you search and replace all the known pairs that represent larger letters. (AA -> K)

The nice thing about this algorithm is you can optimize the search/replace by first searching for and indexing all the As and Bs in the array.

Since you are using Python, regardless of what you do, you should convert the string into a list of numbers. The numbers 0-9 are static objects in Python so that means they are free to allocate. You could also create reusable character objects of A thru Z. The other benefit of a list is the replace operation to remove two elements and insert a single element is problem much faster than copying the string over and over.

jmucchiello 2009-09-04 05:35:33

Answer 3

+1 A:

Recursion elimination is always a fun task. Here, I'd focus on ensuring the cache is correctly populated, then just use it, as follows...:

import collections

def alpha_code(numbers):
    # populate cache with all needed pieces
    cache = dict()
    pending_work = collections.deque([numbers])
    while pending_work:
      work = pending_work.popleft()
      # if cache[work] is known or easy, just go for it
      if work in cache:
        continue
      if work[:1] == '0':
        cache[work] = 0
        continue
      elif len(work) <= 1:
        cache[work] = 1
        continue
      # are there missing pieces? If so queue up the pieces
      # on the left (shorter first), the current work piece
      # on the right, and keep churning
      n1 = work[1:]
      t1 = cache.get(n1)
      if t1 is None:
        pending_work.appendleft(n1)
      if work[:2] <= '26':
        n2 = work[2:]
        t2 = cache.get(n2)
        if t2 is None:
          pending_work.appendleft(n2)
      else:
        t2 = 0
      if t1 is None or t2 is None:
        pending_work.append(work)
        continue
      # we have all pieces needed to add this one
      total = t1 + t2
      cache[work] = total

    # cache fully populated, so we know the answer
    return cache[numbers]

Alex Martelli 2009-09-04 05:46:29

Upvoted. I've been trying to do the wrong thing, apparently - but this is how you do wrong things right :P

ooboo 2009-09-04 12:29:13

Answer 4

A:

You can largely reduce the memory footprint by not copying the string, and passing instead the original string and the index of the first character to study:

def alpha_code(numbers, start_from = 0)
    ....

You would then call recursively as:

alpha_code(numbers, start_from + 1)  # or start_from + 2, etc.

This way, your preserve the simplicity of a recursive algorithm and you save a lot of memory.

EOL 2009-09-04 11:32:27

Hey EOL. There's a problem with this code when I call the function several times, what do I hash? I hash where I started from? So what if I call it multiple times, the hashtable will go crazy.Unless I could in some way make a hashtable that stores distinct hashtable each hashtable will refer to a specific string.

ooboo 2009-09-04 12:19:28

ansaurus

tags:

views:

answers:

How to make this code handle big inputs more efficiently?

related questions