views:

1186

answers:

4

I am fairly new to Python and I am trying to figure out the most efficient way to count the number of .TIF files in a particular sub-directory.

Doing some searching, I found one example (I have not tested), which claimed to count all of the files in a directory:

file_count = sum((len(f) for _, _, f in os.walk(myPath)))

This is fine, but I need to only count TIF files. My directory will contain other files types, but I only want to count TIFs.

Currently I am using the following code:

tifCounter = 0
for root, dirs, files in os.walk(myPath):
    for file in files:    
        if file.endswith('.tif'):
            tifCounter += 1

It works fine, but the looping seems to be excessive/expensive to me. Any way to do this more efficiently?

Thanks.

+3  A: 

Your code is fine.

Yes, you're going to need to loop over those files to filter out the .tif files, but looping over a small in-memory array is negligible compared to the work of scanning the file directory to find these files in the first place, which you have to do anyway.

I wouldn't worry about optimizing this code.

Triptych
+7  A: 

Something has to iterate over all files in the directory, and look at every single file name - whether that's your code or a library routine. So no matter what the specific solution, they will all have roughly the same cost.

If you think it's too much code, and if you don't actually need to search subdirectories recursively, you can use the glob module:

tifCounter = len(glob.glob1(myPath,"*.tif"))
Martin v. Löwis
Thanks. This worked equally well, and in 1/5 the number of lines! Even if it costs the same, it looks prettier! :)
Bryan Lewis
`glob1`? why use undocumented function? why not use `glob.glob` that gives exactly same result?
SilentGhost
@SilentGhost: glob.glob only expects a single parameter, which is a path name. In the specific case, the directory is already available, so there is no need to join it first, just so glob can split it again. In addition, if myPath has a glob character in it, glob.glob would interpret it.
Martin v. Löwis
Actually, this solution includes directories ending with '.tif', you need an additional filtering.
tonfa
+1  A: 

If you do need to search recursively, or for some other reason don't want to use the glob module, you could use

file_count = sum(len(f for f in fs if f.lower().endswith('.tif')) for _, _, fs in os.walk(myPath))

This is the "Pythonic" way to adapt the example you found for your purposes. But it's not going to be significantly faster or more efficient than the loop you've been using; it's just a really compact syntax for more or less the same thing.

David Zaslavsky
Since when does the term "pythonic" describe the routine of transforming perfectly readable 3 lines of code into one single line of nested for loops that takes at least 5 times as long to comprehend and violates PEP8 in the process?
piquadrat
Since people have been doing that sort of thing in Python (and that's been quite a while). But do note that I put "Pythonic" in quotes ("quote-Pythonic-unquote") because what actually gets done in Python and what is specified in PEP 8 are two different things.
David Zaslavsky
+2  A: 

For this particular use case, if you don't want to recursively search in the subdirectory, you can use os.listdir:

len([f for f in os.listdir(myPath) 
     if f.endswith('.tif') and os.path.isfile(os.path.join(myPath, f))])
tonfa