views:

307

answers:

3

Hey all, this is my first time recently trying to get into the file and os part of Python. I am trying to search a directory then find all sub directories. If the directory has no folders, add all the files to a list. And organize them all by dict.

So for instance a tree could look like this

  • Starting Path
    • Dir 1
      • Subdir 1
      • Subdir 2
      • Subdir 3
        • subsubdir
          • file.jpg
          • folder1
            • file1.jpg
            • file2.jpg
          • folder2
            • file3.jpg
            • file4.jpg

Even if subsubdir has a file in it, it should be skipped because it has folders in it.

Now I can normally do this if I know how many directories I am going to be looking for, using os.listdir and os.path.isdir. However if I want this to be dynamic it will have to compensate for any amount of folders and subfolders. I have tried using os.walk and it will find all the files easily. The only trouble I am having is creating all the dicts with the path names that contain file. I need the foldernames organized by dict, up until the starting path.

So in the end, using the example above, the dict should look like this with the files in it:

dict['dir1']['subdir3']['subsubdir']['folder1'] = ['file1.jpg', 'file2.jpg']

dict['dir1']['subdir3']['subsubdir']['folder2'] = ['file3.jpg', 'file4.jpg']

Would appreciate any help on this or better ideas on organizing the information. Thanks.

+1  A: 

There is a basic problem with the way you want to structure the data. If dir1/subdir1 contains subdirectories and files, should dict['dir1']['subdir1'] be a list or a dictionary? To access further subdirectories with ...['subdir2'] it needs to be a dictionary, but on the other hand dict['dir1']['subdir1'] should return a list of files.

Either you have to build the tree from custom objects that combine these two aspects in some way, or you have to change the tree structure to treat files differently.

sth
Well that's why I want it so if it finds a folder it will just skip adding files.
Chuck
+1  A: 

I don't know why you would want to do this. You should be able to do your processing using os.path.walk, but in case you really need such a structure, you can do (untested):

import os

def dirfunc(fdict, dirname, fnames):
    tmpdict = fdict
    keys = dirname.split(os.sep)[:-1]
    for k in keys:
        tmpdict = tmpdict.setdefault(k, {})

    for f in fnames:
        if os.path.isdir(f):
            return

    tmpdict[dirname] = fnames

mydict = {}
os.walk(directory_to_search, dirfunc, mydict)

Also, you should not name your variable dict because it's a Python built-in. It is a very bad idea to rebind the name dict to something other than Python's dict type.

Edit: edited to fix the "double last key" error and to use os.walk.

Alok
Aye. The dict variable was just me being lazy. Anyways this does work except it creates a duplicate key if there is a file. In other words (using the above example)dictvar['dir1']['subdir3']['subsubdir']['folder2']['folder2'] = ['file3.jpg', 'file4.jpg']
Chuck
That's the danger in pasting untested code. You can fix it by doing: keys = dirname.split(os.sep)[:-1].
Alok
use os.walk(), not os.path.walk
Thanks for the comment. Should have used os.walk().
Alok
Also that's not the proper use of os.walk(), it is different from os.path.walk.
Chuck
That should *really* teach me to not post untested code!
Alok
+1  A: 

Maybe you want something like:

def explore(starting_path):
  alld = {'': {}}

  for dirpath, dirnames, filenames in os.walk(starting_path):
    d = alld
    dirpath = dirpath[len(starting_path):]
    for subd in dirpath.split(os.sep):
      based = d
      d = d[subd]
    if dirnames:
      for dn in dirnames:
        d[dn] = {}
    else:
      based[subd] = filenames
  return alld['']

For example, given a /tmp/a such that:

$ ls -FR /tmp/a
b/  c/ d/

/tmp/a/b:
z/

/tmp/a/b/z:

/tmp/a/c:
za  zu

/tmp/a/d:

print explore('/tmp/a') emits: {'c': ['za', 'zu'], 'b': {'z': []}, 'd': []}.

If this isn't exactly what you're after, maybe you can show us specifically what the differences are supposed to be? I suspect they can probably be easily fixed, if need be.

Alex Martelli
Unfortunately this produces KeyErrors so I am unable to test this. However from the returned dict in your example, sounds about right.
Chuck