views:

47

answers:

2

Could anyone point me towards a method of cycling a binary file in Python? I have a file full of 4 byte integers basically and when the file reaches a certain size, i.e. a certain number of values have been written, I want to start removing one from the start and adding one at the end.

I'm still reasonably new to Python, so just trying to think of a neat way of doing this.

Thanks.

+3  A: 

My idea: the first integer in the file gives you the position of the actual beginning of the data. At the start this will be 4 (assuming an integer takes 4 bytes). When the file is full, you just start overwriting data at the beginning and increase the position integer. This is basically a simple ring-buffer in file-form.

Space_C0wb0y
This would work fine, but it seems in Python you can't write to the front of a file without rewriting the whole lot - i.e. seek then write doesn't appear to work?
Adam Cobb
I really don't want to have to re-write the whole list each time either as it is vlarge!
Adam Cobb
@Adam Cobb: You'd have to make that an extra question. You may have to flush the file before seeking or something like that.
Space_C0wb0y
Got it working using the suggestion from this question - http://stackoverflow.com/questions/508983/how-to-overwrite-some-bytes-in-the-middle-of-a-file-with-python. Using "r+b" as the file mode.
Adam Cobb
+2  A: 

2000 numbers?

That's 16K. Do it in memory. Indeed, by declaring your buffers to be 16K, you can probably do the entire operation in a single I/O request. And on some large 64-bit systems, 2000 numbers more-or-less is the default buffer size.

Your data volume is microscopic. Don't waste time optimizing such a minuscule amount of data.

with open( "my file.dat", "rb", 16384 ) as the_file:
    my_circular_queue = list( read_the_numbers( the_file ) )

if len(my_circular_queue) >=  2000:
    my_circular_queue = my_circular_queue[1:]
my_circular_queue.append( a_new_number )

with open( "my file.dat", "wb", 16384 ) as the_file:
    write_the_numbers( the_file, my_circular_queue )

It totally fits in memory. Don't waste time trying to finesse a complex update.

S.Lott
Thanks for your answer but i've gone with Space_C0wb0y's solution, i'm sure it would be fine doing this in memory but these writes can happen quite regularly and I don't want to be loading the whole list into memory each time to write back (even if it is only 2000).
Adam Cobb
@Adam Cobb: 2000 floating-point numbers occupies 16K of memory. On some systems, the default I/O buffers are larger than this. It's a negligible, microscopic amount of data. You are wasting time trying to finesse a sophisticated I/O scheme when the amount of data is so microscopically small.
S.Lott