ansaurus

Question

Copying files to directories as specified in a file list with python

Answer 1

+1 A:

I think that the cause is that you are reusing always the same list.

del tmp[:] clears the list and doesn't create a new instance. In your case, you need to create a new list by calling tmp = []

Following fix should work (I didn't test it)

def get_values(file):
    values = []
    tmp = []
    pattern = re.compile(r'^-> (.+?)$')
    for line in file:
        if line.strip().startswith('->'):
            match = re.search(pattern, line.strip())
            if match:
                tmp.append(match.group(1))
        elif line.strip().startswith('Directory'):
            values.append(tmp)
            tmp = []
    return values

luc 2009-08-15 14:01:13

It works. Thanks.

2009-08-16 09:00:38

Answer 2

+1 A:

no need to use regular expression

d = {}
for line in open("file"):
    line=line.strip()
    if line.endswith("\\"):
        directory = line.split(":")[-1].strip().replace("\\","")
        d.setdefault(directory,[])
    if line.startswith("->"):
        song=line.split(" ")[-1]
        d[directory].append(song)
print d

output

# python python.py
{'Images': ['01-some_image1.jpg', '02-some_image2.jpg'], 'Music': ['01-some_song1.mp3', '02-some_song2.mp3', '03-some_song3.mp3']}

ghostdog74 2009-08-15 14:41:01

I like your solution. It's simpler. Haven't thought of doing it this way. The only problem is that in my file, the file names contain spaces so I can't split on space. I'll just split on ">" instead and then use strip() for the remaining space. Thanks.

2009-08-16 08:55:47

Answer 3

A:

If you use collections.defaultdict(list), you get a list that dictionary whose elements are lists. If the key is not found, it is added with a value of empty list, so you can start appending to the list immediately. That's what this line does:

d[dir].append(match.group(1))

It creates the directory name as a key if it does not exist and appends the file name found to the list.

BTW, if you are having problems getting your regexes to work, try creating them with the debug flag. I can't remember the symbolic name, but the number is 128. So if you do this:

file_regex = re.compile(r'^-> (.+?)$', 128)

You get this additional output:

at at_beginning
literal 45
literal 62
literal 32
subpattern 1
  min_repeat 1 65535
    any None
at at_end

And you can see that there is a start line match plus '-> ' (for 45 62 32) and then a repeated any pattern and end of line match. Very useful for debugging.

Code:

from __future__ import with_statement

import re
import collections

def get_values(file):
    d = collections.defaultdict(list)
    dir = ""
    dir_regex = re.compile(r'^Directory: (.+?)\\$')
    file_regex = re.compile(r'\-\> (.+?)$')
    with open(file) as f:
        for line in f:
            line = line.strip()
            match = dir_regex.search(line)
            if match:
                dir = match.group(1)
            else:
                match = file_regex.search(line)
                if match:
                    d[dir].append(match.group(1))
    return d

if __name__ == '__main__':
    d = get_values('test_file')
    for k, v in d.items():
        print k, v

Result:

Images ['01-some_image1.jpg', '02-some_image2.jpg']
Music ['01-some_song1.mp3', '02-some_song2.mp3', '03-some_song3.mp3']

hughdbrown 2009-08-15 15:02:56

Thanks for the detailed answer. While I find ghostdog's solution simpler, your answer was equally informative. Thank you.

2009-08-16 08:55:16

ansaurus

tags:

views:

answers:

Copying files to directories as specified in a file list with python

related questions