views:

90

answers:

3

I have a generator for a large set of items. I want to iterate through them once, outputting them to a file. However, with the file format I currently have, I first have to output the number of items I have. I don't want to build a list of the items in memory, as there are too many of them and that would take a lot of time and memory. Is there a way to iterate through the generator, getting its length, but somehow be able to iterate through it again later, getting the same items?

If not, what other solution could I come up with for this problem?

+4  A: 

If you can figure out how to just write a formula to calculate the size based on the parameters that control the generator, do that. Otherwise, I don't think you would save much time.

Include the generator here, and we'll try to do it for you!

Nathan
ah yep i realiezd this soon after i posted =)
Claudiu
+4  A: 

This cannot be done. Once a generator is exhausted it needs to be reconstructed in order to be used again. It is possible to define the __len__() method on an iterator object if the number of items is known ahead of time, and then len() can be called against the iterator object.

Ignacio Vazquez-Abrams
+4  A: 

I don't think that is possible for any generalized iterator. You will need to figure out how the generator was originally constructed and then regenerate it for the final pass.

Alternatively, you could write out a dummy size to your file, write the items, and then reopen the file for modification and correct the size in the header.

If your file is a binary format, this could work quite well, since the number of bytes for the size is the same regardless of what the actual size is. If it is a text format, it is possible that you would have to add some extra length to the file if you weren't able to pad the dummy size to cover all cases. See this question for a discussion on inserting and rewriting in a text file using Python.

A. Levy