views:

233

answers:

1

I'm working through some python problems on pythonchallenge.com to teach myself python and I've hit a roadblock, since the string I am to be using is too large for python to handle. I receive this error:

my-macbook:python owner1$ python singleoccurrence.py
Traceback (most recent call last):
  File "singleoccurrence.py", line 32, in <module>
    myString = myString.join(line)
OverflowError: join() result is too long for a Python string

What alternatives do I have for this issue? My code looks like such...

#open file testdata.txt
#for each character, check if already exists in array of checked characters
#if so, skip.
#if not, character.count
#if count > 1, repeat recursively with first character stripped off of page.
# if count = 1, add to valid character array.
#when string = 0, print valid character array.

valid = []
checked = []
myString = ""

def recursiveCount(bigString):
    if len(bigString) == 0:
     print "YAY!"
     return valid
    myChar = bigString[0]
    if myChar in checked:
     return recursiveCount(bigString[1:])
    if bigString.count(myChar) > 1:
     checked.append(myChar)
     return recursiveCount(bigString[1:])
    checked.append(myChar)
    valid.append(myChar)
    return recursiveCount(bigString[1:])

fileIN = open("testdata.txt", "r")
line = fileIN.readline()

while line:
    line = line.strip()
    myString = myString.join(line)
    line = fileIN.readline()

myString = recursiveCount(myString)
print "\n"
print myString
+9  A: 

string.join doesn't do what you think. join is used to combine a list of words into a single string with the given seperator. Ie:

>>> ",".join(('foo', 'bar', 'baz'))
'foo,bar,baz'

The code snippet you posted will attempt to insert myString between every character in the variable line. You can see how that will get big quickly :-). Are you trying to read the entire file into a single string, myString? If so, the way you want to concatenate the strings is like this:

myString = myString + line

While I'm here... since you're learning Python here are some other suggestions.

There are easier ways to read an entire file into a variable. For instance:

fileIN = open("testdata.txt", "r")
myString = fileIN.read()

(This won't have the exact behaviour of your existing strip() code, but may in fact do what you want.)

Also, I would never recommend practical Python code use recursion to iterate over a string. Your code will make a function call (and a stack entry) for every character in the string. Also I'm not sure Python will be very smart about all the uses of bigString[1:]: it may well create a second string in memory that's a copy of the original without the first character. The simplest way to process every character in a string is:

for mychar in bigString:
    ... do your stuff ...

Finally, you are using the list named "checked" to see if you've ever seen a particular character before. But the membership test on lists ("if myChar in checked") is slow. In Python you're better off using a dictionary:

checked = {}
...
if not checked.has_key(myChar):
    checked[myChar] = True
    ...

This exercise you're doing is a great way to learn several Python idioms.

Nelson
Actually, `myString = myString + line` is a bad idea because of how much overhead it'd create due to the immutability of strings. Instead, it's recommended to buffer it into a list which gets compressed into a string like so: `"".join(my_data_list)`
Evan Fosmark
Or just read the file all at once :-) I only suggested the myString + line thing because I think that's closest to what the asking person was trying to do.
Nelson
huge thumbs up for all of the advice :) mucho thanks!
Chris
Unless you're in an old Python, a set is easier than a dict
chrispy