I am using the following code, with nested generators, to iterate over a text document and return training examples using get_train_minibatch()
. I would like to persist (pickle) the generators, so I can get back to the same place in the text document. However, you cannot pickle generators.
Is there a simple workaround, so that I can save my position and start back where I stopped? Perhaps I can make
get_train_example()
a singleton, so I don't have several generators lying around. Then, I could make a global variable in this module that keeps track of how far alongget_train_example()
is.Do you have a better (cleaner) suggestion, to allow me to persist this generator?
[edit: Two more ideas:
Can I add a member variable/method to the generator, so I can call generator.tell() and find the file location? Because then, the next time I create the generator, I can ask it to seek to that location. This idea sounds the simplest of everything.
Can I create a class and have the file location be a member variable, and then have the generator created within the class and update the file location member variable each time it yields? Because then I can know how far into the file it it.
]
Here is the code:
def get_train_example():
for l in open(HYPERPARAMETERS["TRAIN_SENTENCES"]):
prevwords = []
for w in string.split(l):
w = string.strip(w)
id = None
prevwords.append(wordmap.id(w))
if len(prevwords) >= HYPERPARAMETERS["WINDOW_SIZE"]:
yield prevwords[-HYPERPARAMETERS["WINDOW_SIZE"]:]
def get_train_minibatch():
minibatch = []
for e in get_train_example():
minibatch.append(e)
if len(minibatch) >= HYPERPARAMETERS["MINIBATCH SIZE"]:
assert len(minibatch) == HYPERPARAMETERS["MINIBATCH SIZE"]
yield minibatch
minibatch = []