views:

741

answers:

4

I have a C++/Obj-C background and I am just discovering Python (been writing it for about an hour). I am writing a script to recursively read the contents of text files in a folder structure.

The problem I have is the code I have written will only work for one folder deep. I can see why in the code (see #hardcoded path), I just don't know how I can move forward with Python since my experience with it is only brand new.

Here is what my directory structure looks like:

alt text

Python Code:

import os
import sys

rootdir = sys.argv[1]

for root, subFolders, files in os.walk(rootdir):

    for folder in subFolders:
        outfileName = rootdir + "/" + folder + "/py-outfile.txt" # hardcoded path
        folderOut = open( outfileName, 'w' )
        print "outfileName is " + outfileName

        for file in files:
            filePath = rootdir + '/' + file
            f = open( filePath, 'r' )
            toWrite = f.read()
            print "Writing '" + toWrite + "' to" + filePath
            folderOut.write( toWrite )
            f.close()

        folderOut.close()
+1  A: 

I think the problem is that you're not processing the output of os.walk correctly.

Fiirstly, change:

filePath = rootdir + '/' + file

to:

filePath = root + '/' + file

rootdir is your fixed starting directory; root is a directory returned by os.walk.

Secondly, you don't need to indent your file processing loop, as it makes no sense to run this for each subdirectory. You'll get root set to each subdirectory. You don't need to process the subdirectories by hand unless you want to do something with the directories themselves.

Dave Webb
I have data in each sub directory, so I need to have a separate text file for the contents of each directory.
Brock Woolf
@Brock: the files part is the list of files in the current directory. So the indentation is indeed wrong. You are writing to `filePath = rootdir + '/' + file`, that doesn't sound right: file is from the list of current files, so you are writing to a lot of existing files?
Alok
A: 

use os.path.join() to construct your paths. its neater

import os
import sys
rootdir = sys.argv[1]
for root, subFolders, files in os.walk(rootdir):
    for folder in subFolders:
        outfileName = os.path.join(root,folder,"py-outfile.txt")
        folderOut = open( outfileName, 'w' )
        print "outfileName is " + outfileName
        for file in files:
            filePath = os.path.join(root,file)
            toWrite = open( filePath).read()
            print "Writing '" + toWrite + "' to" + filePath
            folderOut.write( toWrite )
        folderOut.close()
ghostdog74
It looks like this code works for folders 2 levels (or deeper) only. Still it does get me closer.
Brock Woolf
+5  A: 

Make sure you understand the three return values of os.walk:

for root, subFolders, files in os.walk(rootdir):

has the following meaning:

  • root: Current path which is "walked through"
  • subFolders: Files in root of type directory
  • files: Files in root (not in subFolders) of type other than directory

And please use os.path.join instead of concatenating with a slash! Your problem is filePath = rootdir + '/' + file - you must concatenate the currently "walked" folder instead of the topmost folder. So that must be filePath = os.path.join(root, file). BTW "file" is a builtin, so you don't normally use it as variable name.

Another problem are your loops, which should be like this, for example:

import os
import sys

rootdir = sys.argv[1]

for root, subFolders, files in os.walk(rootdir):
    outfileName = os.path.join(root, "py-outfile.txt")
    print "outfileName is " + outfileName
    with open( outfileName, 'w' ) as folderOut:
        for folder in subFolders:
            print "%s has subdirectory %s" % (root, folder)

        for filename in files:
            filePath = os.path.join(root, filename)

            with open( filePath, 'r' ) as f:
                toWrite = f.read()
                folderOut.write("The file %s contains %s" % (filePath, toWrite))
                folderOut.write( toWrite )

If you didn't know, the "with" statement for files is a shorthand:

with open("filename", "r") as f:
    dosomething()

# is effectively the same as

f = open("filename", "r")
try:
    dosomething()
finally:
    f.close()
AndiDog
Superb, lots of prints to understand what's going on and it works perfectly. Thanks! +1
Brock Woolf
+1  A: 

Agree with Dave Webb, os.walk will yield an item for each directory in the tree. Fact is, you just don't have to care about subFolders.

a code like that should work :

import os
import sys

rootdir = sys.argv[1]

for folder, subs, files in os.walk(rootdir):
    with open(os.path.join(folder,'python-outfile.txt'), 'w') as dest:
        for filename in files:
            with open(os.path.join(folder, filename), 'r') as src:
                dest.write(src.read())
Clément
Nice one. This works as well. I do however prefer AndiDog's version even though its longer because it's clearer to understand as a beginner to Python. +1
Brock Woolf