The background: I'm building a trie to represent a dictionary, using a minimal construction algorithm. The input list is 4.3M utf-8 strings, sorted lexicographically. The resulting graph is acyclic and has a maximum depth of 638 nodes. The first line of my script sets the recursion limit to 1100 via sys.setrecursionlimit()
.
The problem: I'd like to be able to serialize my trie to disk, so I can load it into memory without having to rebuild from scratch (roughly 22 minutes). I have tried both pickle.dump()
and cPickle.dump()
, with both the text and binary protocols. Each time, I get a stack-trace that looks like the following:
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pickle.py", line 663, in _batch_setitems
save(v)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pickle.py", line 725, in save_inst
save(stuff)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pickle.py", line 648, in save_dict
self.memoize(obj)
RuntimeError: maximum recursion depth exceeded
My data structures are relatively simple: trie
contains a reference to a start state, and defines some methods. dfa_state
contains a boolean field, a string field, and a dictionary mapping from label to state.
I'm not very familiar with the inner workings of pickle
- does my max recursion depth need to be greater/equal n times the depth of the trie for some n? Or could this be caused by something else I'm unaware of?
Update: Setting the recursion depth to 3000 didn't help, so this avenue doesn't look promising.
Update 2: You guys were right; I was being short-sighted in assuming that pickle would use a small nesting depth due to default recursion limitations. 10,000 did the trick.