views:

349

answers:

2

How would I write a function in Python to determine if a list of filenames matches a given pattern and which files are missing from that pattern? For example:

Input -> KUMAR.3.txt KUMAR.4.txt KUMAR.6.txt KUMAR.7.txt KUMAR.9.txt KUMAR.10.txt KUMAR.11.txt KUMAR.13.txt KUMAR.15.txt KUMAR.16.txt

Desired Output--> KUMAR.5.txt KUMAR.8.txt KUMAR.12.txt KUMAR.14.txt


Input --> KUMAR3.txt KUMAR4.txt KUMAR6.txt KUMAR7.txt KUMAR9.txt KUMAR10.txt KUMAR11.txt KUMAR13.txt KUMAR15.txt KUMAR16.txt

Desired Output --> KUMAR5.txt KUMAR8.txt KUMAR12.txt KUMAR14.txt

+1  A: 

Assuming the patterns are relatively static, this is easy enough with a regex:

import re

inlist = "KUMAR.3.txt KUMAR.4.txt KUMAR.6.txt KUMAR.7.txt KUMAR.9.txt KUMAR.10.txt KUMAR.11.txt KUMAR.13.txt KUMAR.15.txt KUMAR.16.txt".split()

def get_count(s):
    return int(re.match('.*\.(\d+)\..*', s).groups()[0])

mincount = get_count(inlist[0])
maxcount = get_count(inlist[-1])
values = set(map(get_count, inlist))
for ii in range (mincount, maxcount):
    if ii not in values:
     print 'KUMAR.%d.txt' % ii
John Millikin
+1  A: 

You can approach this as:

  1. Convert the filenames to appropriate integers.
  2. Find the missing numbers.
  3. Combine the missing numbers with the filename template as output.

For (1), if the file structure is predictable, then this is easy.

def to_num(s, start=6):
    return int(s[start:s.index('.txt')])

Given:

lst = ['KUMAR.3.txt', 'KUMAR.4.txt', 'KUMAR.6.txt', 'KUMAR.7.txt',
       'KUMAR.9.txt', 'KUMAR.10.txt', 'KUMAR.11.txt', 'KUMAR.13.txt',
       'KUMAR.15.txt', 'KUMAR.16.txt']

you can get a list of known numbers by: map(to_num, lst). Of course, to look for gaps, you only really need the minimum and maximum. Combine that with the range function and you get all the numbers that you should see, and then remove the numbers you've got. Sets are helpful here.

def find_gaps(int_list):
    return sorted(set(range(min(int_list), max(int_list))) - set(int_list))

Putting it all together:

missing = find_gaps(map(to_num, lst))
for i in missing:
    print 'KUMAR.%d.txt' % i
John Fouhy