views:

111

answers:

3

Hello SO!

I am trying to improve performance of elfinder , an ajax based file manager(elRTE.ru) .

It uses os.listdir in a recurisve to walk through all directories recursively and having a performance hit (like listing a dir with 3000 + files takes 7 seconds ) ..

I am trying to improve performance for it here is it's walking function:

        for d in os.listdir(path):
            pd = os.path.join(path, d)
            if os.path.isdir(pd) and not os.path.islink(pd) and self.__isAccepted(d):
                tree['dirs'].append(self.__tree(pd))

My questions are :

  1. If i change os.walk instead of os.listdir , would it improve performance?
  2. how about using dircache.listdir() ? cache WHOLE directory/subdir contents at the initial request and return cache results , if theres no new files uploaded or no changes in file?
  3. Is there any other method of Directory walking which is faster?
  4. Any Other Server Side file browser which is fast written in python (but i prefer to make this one fast)?
+1  A: 

In order:

  • I doubt you'll see much of a speed-up between os.walk and os.listdir, since both rely on the underlying filesystem. In fact, I suspect the underlying filesystem is going to have a big effect on the speed of the operation.

  • Any cache operation is going to be significantly faster than hitting the filesystem (at least for the second and subsequent checks).

  • You could always write some utility (or call a shell command) which generates the list of directories outside of Python, and called that through the subprocess module. But that's a little complicated, and I'd turn to that solution only if the cache turned out to not work for you.

  • If you haven't located a file browser on the Cheeseshop, you probably won't find one.

Chris B.
i have to compare between performance of listdir vs shell commands. I doubt they will have difference..
V3ss0n
A: 

os.path.walk may increase your performance, for two reasons:

1) If you can stop walking before you've walked everything, then indeed it will be faster than listdir, although only noticeable when dealing with large trees

2) If you're listing HUGE directories, then it can be expensive to make the list returned by listdir. (Not true, see alex's comment below)

However, it probably won't make a difference and may in fact be slower, due to the potentially extra overhead incurred by calling your visit function and doing all the extra argument packing and unpacking.

(Really the only way to answer this question is to test it yourself - it should only take a few minutes)

Nick Bastin
Both the relatively-new os.walk and the old-and-crusty os.path.walk necessarily read each directory entirely because they must present the names in it as one or two lists (os.path.walk is specified in the docs as using os.listdir, but how do you think os.walk does it?-). So (2) doesn't really apply.
Alex Martelli
Well, phooey. I still stand by my admonition that one should test these things.. :-)
Nick Bastin
so thats mean no performance difference..But atleast with os.walk , wont need to be doing :os.path.isdir(pd) and not os.path.islink(pd)as it will give out files/dirs separately right?Alrtie i am going to test it and let you know!
V3ss0n
+3  A: 

You should measure directly on the machines (OSs, filesystems and caches thereof, etc) of your specific interest -- whether or not os.walk is faster than os.listdir on a specific and totally different machine / OS / FS will tell you very little about performance on yours.

Not sure what you mean by cachedir.listdir -- no standard library module / function by that name. listdir already reads all the directory in at one gulp (as it must sort the results) as does os.walk (as it must separate subdirectories from files). If, depending on your platform, you have a fast way of being notified about file/directory changes, then it's probably worth building the tree up once and editing it incrementally as change notifications come... but it depends on the relative frequency of changes vs requests, which is, again, totally dependent on your specific application circumstances.

Alex Martelli
Sorry , fixed , i mean dircache.listdir
V3ss0n
@V3ss0n, `dircache` never worked particularly well and has finally been deprecated since Python 2.6 and removed since Python 3.0 -- I would definitely not suggest it.
Alex Martelli
Ok , so gonna scrap it then :) Thanks!
V3ss0n