views:

669

answers:

8

Hi, I wrote a recursive function to find the no. of instances of a substring in the parent string. The way I am keeping count is by declaring/initialising count as a global variable outside the function's scope. Problem is, it'll give me correct results only the first time the function is run, because after that count != 0 to begin with. And if i have it inside the function, than each time it is called recursively, it'll be set to 0.

count=0
def countSubStringMatchRecursive(target,key):
    index=find(target,key)
    global count
    targetstring=target
    if index>=0:
        count=count+1
        target=target[index+len(key):]
        countSubStringMatchRecursive(target,key)
    else :
        pass
    return "No. of instances of", key, 'in', targetstring, 'is', count

Note: I am looking for the solution for a recursive function specifically, I have an iterative function that does work fine.

EDIT: Thank You all, this was part of homework, so I was only using the string module

+5  A: 

One way to modify your code would be to use a local function as follows:

def countSubStringMatchRecursive(target,key):
    def countit(target,key,count):
        index=find(target,key)
        if index>=0:
            target=target[index+len(key):]
            count += countit(target,key,count) + 1
        return count
    return "No. of instances of", key, 'in', target, 'is', countit(target,key,0)
Greg Hewgill
A: 

Another way could be to have a third optional parameter on the countSubStringMatchRecursive function called count that is originally set to 0. That way you could keep track of the count. This would expose the count variable to the outside world which might not be desirable, but since it's no worse than your global variable I don't think it would be a problem in your case.

You would also have to change the code to make the last recursive call be the call that gives the return statement to the outside world. See this example (untested):

def countSubStringMatchRecursive(target, key, count = 0):
    index = find(target, key)
    targetstring = target
    if index >= 0:
        count += 1
        target = target[index+len(key):]
        countSubStringMatchRecursive(target, key, count)
    else:
        return "No. of instances of", key, 'in', targetstring, 'is', count

Edit: I realised that you would need a fourth parameter to be able to keep the original string traveling along the recursion. This is probably a less than optimal solution and I would recommend using Greg Hewgill's solution. It has a clean separation between the interactions with the outside and the "business logic", making the code more reusable!

adamse
+1  A: 

Your recursive function has O(n^2) performance because it copies the remaining contents of the string each time it finds a match. This is slower than the iterative solution O(n) and unnecessarily so.

You can easily rewrite it to be faster, and at the same time simplify the code and extend its functionality by passing a start index for the search as an optional parameter to the function:

def countSubStringMatchRecursive(target, key, start_index = 0):
    index = target.find(key, start_index)
    if index >= 0:
        return countSubStringMatchRecursive(target, key, index + len(key)) + 1
    return 0

target_string = 'an apple and a banana'
key = 'an'
count = countSubStringMatchRecursive(target_string,  key)
print "Number of instances of %r in %r is %d" % (key, target_string, count)

Output:

Number of instances of 'an' in 'an apple and a banana' is 4

Update: If you really want to use the string module's find function, you can do this just by changing one line:

index = find(target, key, start_index)
Mark Byers
A: 

How about this?

def count_it(target, key):
    index = target.find(key)
    if index >= 0:
        return 1 + count_it(target[index+len(key):], key)
    else:
        return 0


print count_it("aaa bbb aaa ccc aaa", "aaa")

Output:

3
Ryan Ginstrom
+3  A: 

Here's something similar to Greg Hewgill's answer. However, instead we pass along the current count each time we call the function, and then return the count when there are no more matches to be made. While I suspect it makes no difference in Python, in languages that implement tail-call recursion, this allows each successive call to do_count to be optimised away on the call stack. This means that each call to do_count doesn't cause the call stack to grow.

def count_sub_strings(target, key):
    def do_count(target, key, count):
        index = target.find(key)
        if index >= 0:
            target = target[index + len(key):]
            return do_count(target, key, count + 1)
        else:
            return count
    return "No. of instances of %s in %s is %s" % (key, target, do_count(target, key, 0))
Michael Williamson
+1 for proper tail recursion.
Greg Hewgill
Python doesn't optimize tail recursion. This won't help unfortunately. Furthermore this is still an O(n^2) algorithm - see my answer to see how to make it O(n).
Mark Byers
A: 

Not untested ...

code:

def countSubStringMatchRecursive(target, key, count=0):
    #### index = find(target, key) # HUH?
    index = target.find(key)
    if index >= 0:
        count += 1
        target = target[index+len(key):]
        count = countSubStringMatchRecursive(target, key, count)
    return count

for test in ['', 'bar', 'foo', 'foofoo', 'foo foo foo fo']:
   print countSubStringMatchRecursive(test, 'foo'), test.count(key), repr(test)

output:

0 0 ''
0 0 'bar'
1 1 'foo'
2 2 'foofoo'
3 3 'foo foo foo fo'

I'm presuming that this is just amusement or homework ... recursive function must be slower than corresponding Python iterative solution, which will be naturally slower than using target.count(key) ... so I haven't bothered with fixing all the problems your version had ... but do read PEP-008 :-)

Comments on string module

You commented that you had omitted from string import find. What version of Python are you using? What is the last update date on the book or tutorial that you are using?

From the start of the string module (it will be on your computer as <your Python install directory>/Lib/string.py; I'm quoting from the 2.6 version):

"""A collection of string operations (most are no longer used).

Warning: most of the code you see here isn't normally used nowadays. Beginning with Python 1.6, many of these functions are implemented as methods on the standard string object. They used to be implemented by a built-in module called strop, but strop is now obsolete itself.

etc """

and here is that file's code for the find function (stripped of comments):

def find(s, *args):
    return s.find(*args)

so using string.find(target, key) instead of target.find(key) is a waste.

John Machin
homework... from MIT_OCW...#### index = find(target, key) # HUH?Oops! I missed from string import * in the code snippet.Thaks
gsin
string module: yuk. See my augmented answer.
John Machin
the material i am using is quite outdated it seems, thanks
gsin
By the way (1) `from anymodule import *` is not a good idea; it's better to import explicitly the objects that you actually need. (2) There's this thing called an 'upvote button' at the top-left of each answer ;-)
John Machin
+2  A: 

Just a side note: all solutions presented (from the original Q to all the As) are solving a problem that's different than the specifically stated one (I imagine that's a bug in the specific problem statement, but, worth fixing if so;-). Consider:

>>> 'banana'.count('ana')
1
>>> sum('banana'[x:x+3]=='ana' for x in range(len('banana')))
2

the first expression is counting the non-overlapping occurrences of 'ana' in 'banana'; the second one is counting all occurrences -- there are two occurrences in all, at indices 1 and 3 in 'banana', and they overlap. So given the problem statement, and I quote:

find the no. of instances of a substring in the parent string.

without any mention of "non-overlapping", it seems that overlapping occurrences should be counted. Of course, that's easy to fix, once noticed -- you just have to advance by 1 each time, instead of advancing by len(key) which leads you to skip overlapping occurrences.

So, for example:

import string

def countit(target, key, startfrom=0):
    where = string.find(target, key, startfrom)
    if where < 0: return 0
    return 1 + countit(target, key, where+1)

print countit('banana', 'ana')

prints 2, counting both (overlapping) occurrences.

Alex Martelli
Thank You. I hadn't even considered overlapping occurrences.
gsin
A: 
def countSubStringMatchRecursive(target,key):
index = string.find(target, key)
if index == -1:
    return 0
else:
    return 1 + countSubStringMatchRecursive(target[index+len(key):],key)
leesto