tags:

views:

135

answers:

5

How can you print the basenames of files by Python in the main folder and subfolders?

My attempt

#!/usr/bin/python    

import os    
import sys    

def dir_basename (dir_name):    
        for dirpath, dirnames, filenames in os.walk(dir_name):
                for fname in filenames:
                        print os.path.basename(fname)            // Problem here!

if len(sys.argv) != 1:
        u = "Usage: dir_basename <dir_name>\n"
        sys.stderr.write(u)                       
        sys.exit(1)            

dir_basename ( sys.argv[1] )


1st problem solved with the off-by-one-error


2nd problem: The code gives me the output unsuccessfully

man.aux
about_8php.tex
refman.pdf
successful_notice.php
...

I expect to get as an output

 aux
 tex
 pdf
 php
 ...
+3  A: 
if len(sys.argv) != 1:

I think you mean 2. argv[0] is the name of the script; argv[1] is the first argument, etc.

Jonathan Feinberg
+1  A: 

The length of sys.argv is 2 because you have an item at index 0 (the program name) and an item at index 1 (the first argument to the program).

Changing your program to compare against 2 appears to give the correct results, without making any other changes.

Mark Rushakoff
+1  A: 

argv typically includes the name of the program/script invokved as the first element, and thus the length when passing it a single argument is actually 2, not 1.

Amber
+8  A: 

Let me explain the debugging methodology a little bit.

As you've encountered the situation in which len(sys.argv) != 1, you should ask youself: "What is the actual value of len(sys.argv)? Why it is so?". The answers are:

>>> len(sys.argv)
2
>>> sys.argv
['/tmp/basename.py', '/path/to/home/Desktop/pgCodes/']

I guess now the problem should become more clear.

Edit: To address your second question, things you are interested in are called file extensions or suffixes, not basenames. Here is the complete solution:

import sys, os

def iflatten(xss):
    'Iterable(Iterable(a)) -> Iterable(a)'
    return (x for xs in xss for x in xs)

def allfiles(dir):
    'str -> Iterable(str)'
    return iflatten(files for path, dirs, files in os.walk(dir))

def ext(path):
    'str -> str'
    (root, ext) = os.path.splitext(path)
    return ext[1:]

def main():
    assert len(sys.argv) == 2, 'usage: progname DIR'
    dir = sys.argv[1]

    exts = (ext(f) for f in allfiles(dir))
    for e in exts:
        print e

if __name__ == '__main__':
    main()
Andrey Vlasovskikh
+1 for teaching how to think about the problem more constructively: "should ask yourself..."
Adam Bernier
+2  A: 

As others have noted, the first element of sys.argv is the program::

# argv.py
import sys

for index, arg in enumerate(sys.argv):
    print '%(index)s: %(arg)s' % locals()

If I run this without parameters::

$ python argv.py 
0: argv.py

I see that the first and only item in argv is the name of the program/script. If I pass parameters::

$ python argv.py a b c
0: argv.py
1: a
2: b
3: c

And so on.

The other thing is that you really don't need to use os.path.basename on the items in the third element of the tuple yielded by os.walk::

import os
import sys

# Imagine some usage check here...

# Slice sys.argv to skip the first element...
for path in sys.argv[1:]:
    for root, dirs, files in os.walk(path):
        for name in files:
            # No need to use basename, since these are already base'd, so to speak...
            print name
Mark McEahern
I forgot to post the code for argv.py::import sysfor index, arg in enumerate(sys.argv): print '%(index)s: %(arg)s' % locals()
Mark McEahern