views:

106

answers:

3

Suppose I have a list of filenames: [exia.gundam, dynames.gundam, kyrios.gundam, virtue.gundam], or [exia.frame, exia.head, exia.swords, exia.legs, exia.arms, exia.pilot, exia.gn_drive, lockon_stratos.data, tieria_erde.data, ribbons_almark.data, otherstuff.dada].

In one iteration, I'd like to have all the *.gundam or *.data files, whereas on the other I'd like to group the exia.* files. What's the easiest way of doing this, besides iterating through the list and putting each element in a dictionary?

Here's what I had in mind:

def matching_names(files):
    '''
    extracts files with repeated names from a list

    Keyword arguments:
    files - list of filenames

    Returns: Dictionary
    '''

    nameDict = {}
    for file in files:
        filename = file.partition('.')
        if filename[0] not in nameDict:
            nameDict[filename[0]] = []
        nameDict[filename[0]].append(filename[2])

    matchingDict = {}
    for key in nameDict.keys():
        if len(nameDict[key]) > 1:
            matchingDict[key] = nameDict[key] 
    return matchingDict

Well, assuming I have to use that, is there a simple way to invert it and have the file extension as key instead of the name?

+1  A: 

In my first version, it looks like I misinterpreted your question. So if I've got this correct, you're trying to process a list of files so that you can easily access all the filenames with a given extension, or all the filenames with a given base ("base" being the part before the period)?

If that's the case, I would recommend this way:

from itertools import groupby

def group_by_name(filenames):
    '''Puts the filenames in the given iterable into a dictionary where
    the key is the first component of the filename and the value is
    a list of the filenames with that component.'''
    keyfunc = lambda f: f.split('.', 1)[0]
    return dict( (k, list(g)) for k,g in groupby(
               sorted(filenames, key=keyfunc), key=keyfunc
           ) )

For instance, given the list

>>> test_data = [
...   exia.frame, exia.head, exia.swords, exia.legs,
...   exia.arms, exia.pilot, exia.gn_drive, lockon_stratos.data,
...   tieria_erde.data, ribbons_almark.data, otherstuff.dada
... ]

that function would produce

>>> group_by_name(test_data)
{'exia': ['exia.arms', 'exia.frame', 'exia.gn_drive', 'exia.head',
          'exia.legs', 'exia.pilot', 'exia.swords'],
 'lockon_stratos': ['lockon_stratos.data'],
 'otherstuff': ['otherstuff.dada'],
 'ribbons_almark': ['ribbons_almark.data'],
 'tieria_erde': ['tieria_erde.data']}

If you wanted to index the filenames by extension instead, a slight modification will do that for you:

def group_by_extension(filenames):
    '''Puts the filenames in the given iterable into a dictionary where
    the key is the last component of the filename and the value is
    a list of the filenames with that extension.'''
    keyfunc = lambda f: f.split('.', 1)[1]
    return dict( (k, list(g)) for k,g in groupby(
               sorted(filenames, key=keyfunc), key=keyfunc
           ) )

The only difference is in the keyfunc = ... line, where I changed the key from 0 to 1. Example:

>>> group_by_extension(test_data)
{'arms': ['exia.arms'],
 'dada': ['otherstuff.dada'],
 'data': ['lockon_stratos.data', 'ribbons_almark.data', 'tieria_erde.data'],
 'frame': ['exia.frame'],
 'gn_drive': ['exia.gn_drive'],
 'head': ['exia.head'],
 'legs': ['exia.legs'],
 'pilot': ['exia.pilot'],
 'swords': ['exia.swords']}

If you want to get both those groupings at the same time, though, I think it'd be better to avoid a list comprehension, because that can only process them one way or another, it can't construct two different dictionaries at once.

from collections import defaultdict
def group_by_both(filenames):
    '''Puts the filenames in the given iterable into two dictionaries,
    where in the first, the key is the first component of the filename,
    and in the second, the key is the last component of the filename.
    The values in each dictionary are lists of the filenames with that
    base or extension.'''
    by_name = defaultdict(list)
    by_ext = defaultdict(list)
    for f in filenames:
        name, ext = f.split('.', 1)
        by_name[name] += [f]
        by_ext[ext] += [f]
    return by_name, by_ext
David Zaslavsky
I'm fine with iterating through the list, but I was wondering if there was a more generic (and simple) solution. So that if I were to change the format from .gundam to .flag I could use the same code. I could iterate the list and manually add them to a map to see what matches according to the first or second part of the filename, but that would result in a lot more code.
Setsuna F. Seiei
OK, I think maybe my last code sample in the edited version is more what you're looking for. If all your conditions specify either the start or the end of the filename, you could use the `startswith` and `endswith` string methods instead of regular expressions, which might save a bit of computation time, but the code would be longer (but I could edit that way in too, if you want).
David Zaslavsky
@Setsuna: Well, I think you can use os.listdir(path) iterate over the directory and get all the extensions available, then, with that list you can group them like David said.
Oscar Carballal
@David Zaslavsky @Oscar Carballal I edited the OP with what I know/understand of Python to show what I intended to have, but shorter.
Setsuna F. Seiei
@Setsuna: Thanks, that helps. I'll edit my answer accordingly.
David Zaslavsky
A: 

I'm not sure if I entirely get what you're looking to do, but if I understand it correctly something like this might work:

from collections import defaultdict
files_by_extension = defaultdict(list)

for f in files:
    files_by_extension[ f.split('.')[1] ].append(f)

This is creating a hash keyed by file extension and filling it by iterating through the list in a single pass.

Parand
A: 

Suppose for example that you want as the result a list of lists of filenames, grouped by either extension or rootname:

import os.path
import itertools as it

def files_grouped_by(filenames, use_extension=True):
    def ky(fn): return os.path.splitext(fn)[use_extension]
    return [list(g) for _, g in it.groupby(sorted(filenames, key=ky), ky)]

Now files_grouped_by(filenames, False) will return the list of lists grouping by rootname, while if the second argument is True or absent the grouping will be by extension.

If you want instead a dict, the keys being either rootnames or extensions, and the values the corresponding lists of filenames, the approach is quite similar:

import os.path
import itertools as it

def dict_files_grouped_by(filenames, use_extension=True):
    def ky(fn): return os.path.splitext(fn)[use_extension]
    return dict((k, list(g)) 
                for k, g in it.groupby(sorted(filenames, key=ky), ky)]
Alex Martelli