ansaurus

Question

how quick read 25k small txt file content with python

Answer 1

+2 A:

If you've got 25,000 text files on disk, 'you're doing it wrong'. Depending on how you store them on disk, the slowness could literally be seeking on disk to find the files.

If you've got 25,0000 of anything it'll be faster if you put it in a database with an intelligent index -- even if you make the index field the filename it'll be faster.

If you have multiple directories that descend N levels deep, a database would still be faster.

synthesizerpatel 2010-10-07 06:05:12

i store files on single directory

mlzboy 2010-10-07 06:24:50

25k files in one directory will take a long time to list no matter how you slice it. To give you an example, I wrote a script that generated N number files with between 0 and 65 kbytes of data. Running simply 'ls -l' took 0.021 seconds @ 1000 files, 0.199s for 10,000 files, and a whopping 0.487 seconds (half a second!) for 25,000 files. That's worst case scenario of course, but randomly picking files out of this list still means having to traverse the btree _and_ compete with other applications that're using the filesystem for reads and writes.

synthesizerpatel 2010-10-07 11:37:43

Whoops. I understand your problem a bit better now.. Whatever is producing these files should instead be writing directly to a database than using an intermediary file _before_ you write it to the database. If you're parsing through HTML consider writing your spider code in Python so it can do everything at once. Alternatively, use a tiered directory system so you can break up the chunks of files into more manageable parts. i.e. root/a/aa/aardvark.html, root/c/ch/chiapet.html ..

synthesizerpatel 2010-10-07 11:43:29

Answer 2

A:

You can scan the files while downloading them in multiple threads if you use scrapy.

DiggyF 2010-10-07 06:31:49

i keep all step separate,it will keep the solution clear

mlzboy 2010-10-07 08:44:37

Answer 3

A:

If algorith is correct, using the psyco module can sometimes help quite a lot. It does not however work with Python 2.7 or Python 3+

Tony Veijalainen 2010-10-07 07:48:03

ansaurus

tags:

views:

answers:

how quick read 25k small txt file content with python

related questions