views:

65

answers:

2

I'm reading from certain offsets in several hundred and possibly thousands files. Because I need only certain data from certain offsets at that particular time, I must either keep the file handle open for later use OR I can write the parts I need into seperate files.

I figured keeping all these file handles open rather than doing a significant amount of writing to the disk of new temporary files is the lesser of two evils. I was just worried about the efficiency of having so many file handles open.

Typically, I'll open a file, seek to an offset, read some data, then 5 seconds later do the same thing but at another offset, and do all this on thousand of files within a 2 minute timeframe.

Is that going to be a problem?

A followup: Really, I"m asking which is better to leave these thousands file handles open, or to constantly close them and re-open them just when I instantaneously need them.

+4  A: 

Some systems may limit the number of file descriptors that a single process can have open simultaneously. 1024 is a common default, so if you need "thousands" open at once, you might want to err on the side of portability and design your application to use a smaller pool of open file descriptors.

Jim Lewis
so its better to open and re-close a thousands of files, rather than leave them open? I thought I might gain speed efficiency if I didn't close them and re-open them.
ruffiko
@ruffiko: It's hard to say, without knowing more about your specific application. It may be that the way you're doing it works best for your situation. But keeping thousands of files open at once strikes me as a "code smell" that would be worth taking a closer look at. I'd suggest writing a simple benchmark program to see just how much overhead is incurred by repeatedly opening/closing a few thousand files. Why guess, when you can measure?
Jim Lewis
+3  A: 

I recommend that you take a look at Storage.py in BitTorrent. It includes an implementation of a pool of file handles.

MattH