views:

249

answers:

2

I have a bunch of files (TV episodes, although that is fairly arbitrary) that I want to check match a specific naming/organisation scheme..

Currently: I have three arrays of regex, one for valid filenames, one for files missing an episode name, and one for valid paths.

Then, I loop though each valid-filename regex, if it matches, append it to a "valid" dict, if not, do the same with the missing-ep-name regexs, if it matches this I append it to an "invalid" dict with an error code (2:'missing epsiode name'), if it matches neither, it gets added to invalid with the 'malformed name' error code.

The current code can be found here

I want to add a rule that checks for the presence of a folder.jpg file in each directory, but to add this would make the code substantially more messy in it's current state..

How could I write this system in a more expandable way?

The rules it needs to check would be..

  • File is in the format Show Name - [01x23] - Episode Name.avi or Show Name - [01xSpecial02] - Special Name.avi or Show Name - [01xExtra01] - Extra Name.avi
  • If filename is in the format Show Name - [01x23].avi display it a 'missing episode name' section of the output
  • The path should be in the format Show Name/season 2/the_file.avi (where season 2 should be the correct season number in the filename)
  • each Show Name/season 1/ folder should contain "folder.jpg"

.any ideas? While I'm trying to check TV episodes, this concept/code should be able to apply to many things..

The only thought I had was a list of dicts in the format:

checker = [
{
    'name':'valid files',
    'type':'file',
    'function':check_valid(), # runs check_valid() on all files
    'status':0 # if it returns True, this is the status the file gets
}
A: 

maybe you should take the approach of defaulting to: "the filename is correct" and work from there to disprove that statement:

with the fact that you only allow filenames with: 'show name', 'season number x episode number' and 'episode name', you know for certain that these items should be separated by a "-" (dash) so you have to have 2 of those for a filename to be correct.
if that checks out, you can use your code to check that the show name matches the show name as seen in the parent's parent folder (case insensitive i assume), the season number matches the parents folder numeric value (with or without an extra 0 prepended).

if however you don't see the correct amount of dashes you instantly know that there is something wrong and stop before the rest of the tests etc.

and separately you can check if the file "folder.jpg" exists and take the necessary actions. or do that first and filter that file from the rest of the files in that folder.

Sven
+2  A: 

I want to add a rule that checks for the presence of a folder.jpg file in each directory, but to add this would make the code substantially more messy in it's current state..

This doesn't look bad. In fact your current code does it very nicely, and Sven mentioned a good way to do it as well:

  1. Get a list of all the files
  2. Check for "required" files

You would just have have add to your dictionary a list of required files:

checker = {
  ...
  'required': ['file', 'list', 'for_required']
}

As far as there being a better/extensible way to do this? I am not exactly sure. I could only really think of a way to possibly drop the "multiple" regular expressions and build off of Sven's idea for using a delimiter. So my strategy would be defining a dictionary as follows (and I'm sorry I don't know Python syntax and I'm a tad to lazy to look it up but it should make sense. The /regex/ is shorthand for a regex):

check_dict = {
  'delim'    : /\-/,
  'parts'    : [ 'Show Name', 'Episode Name', 'Episode Number' ],
  'patterns' : [/valid name/, /valid episode name/, /valid number/ ],
  'required' : ['list', 'of', 'files'],
  'ignored'  : ['.*', 'hidden.txt'],
  'start_dir': '/path/to/dir/to/test/'
}
  1. Split the filename based on the delimiter.
  2. Check each of the parts.

Because its an ordered list you can determine what parts are missing and if a section doesn't match any pattern it is malformed. Here the parts and patterns have a 1 to 1 ratio. Two arrays instead of a dictionary enforces the order.

Ignored and required files can be listed. The . and .. files should probably be ignored automatically. The user should be allowed to input "globs" which can be shell expanded. I'm thinking here of svn:ignore properties, but globbing is natural for listing files.

Here start_dir would be default to the current directory but if you wanted a single file to run automated testing of a bunch of directories this would be useful.

The real loose end here is the path template and along the same lines what path is required for "valid files". I really couldn't come up with a solid idea without writing one large regular expression and taking groups from it... to build a template. It felt a lot like writing a TextMate language grammar. But that starts to stray on the ease of use. The real problem was that the path template was not composed of parts, which makes sense but adds complexity.

Is this strategy in tune with what you were thinking of?

Joseph Pecoraro