ansaurus

Question

Most efficient way to update a flat file list

Answer 1

+2 A:

I would think that the easiest way to do this would be to delete and recreate the file. Depending on how difficult it is to create the "summary", this could well be the fastest method since you don't need to compare or edit anything.

If the summary creation is "hard" and the database fits in memory, the easiest way to go would probably be to load the database into a dict (keyed on the filename, with data indicating whether or not the file has been "seen") and do the os.walk again, updating the dict as necessary. Then iterate the dict, writing all entries that have been seen.

(BTW the last modified field isn't necessarily useful, you have to check the file's modified time anyway so might as well compare it to the database's timestamp.)

dash-tom-bang 2010-09-15 20:43:07

Answer 2

+1 A:

Probably the best way to handle this is to re-walk the tree again in its entirety. That way, instead of calling File.exist() all the time, you are only calling Directory.list() once per directory. This saves you on file IO calls, which is most likely the bottleneck in this situation.

Once you have a list of the currently-existing files, you can compare the two lists, and determine for each file:

New file
Deleted file
Altered file

and proceed accordingly.

lacqui 2010-09-15 20:46:03

ansaurus

tags:

views:

answers:

Most efficient way to update a flat file list

related questions