views:

34

answers:

1

It is inspired by "How to make a valid Windows filename from an arbitrary string?", I've written a function that will take arbitrary string and make it a valid filename.

My function should technically be an answer to this question, but I want to make sure I've not done anything stupid, or overlooked anything, before posting it as an answer.

I wrote this as part of tvnamer - a utility which takes TV episode filenames, and renames them nice and consistently, with an episode pulled from http://www.thetvdb.com - while the source filename must be a valid file, the series name is corrected, and the episode name - so both could contain theoretically any characters. I'm not so much concerned about security as usability - it's mainly to prevent files being renamed .some.series - [01x01].avi and the file "disappearing" (rather than to thwart evil people)

It makes a few assumptions:

  • The filesystem supports Unicode filenames. HFS+ and NTFS both do, which will cover a majority of users. There is also a normalize_unicode argument to strip out Unicode characters (in tvnamer, this is set via the config XML file)
  • The platform is either Darwin, Linux, and everything else is treated as Windows
  • The filename is intended to be visible (not a dotfile like .bashrc) - it would be simple enough to modify the code to allow .abc format filenames, if desired

Things I've (hopefully) handled:

  • Prepend underscore if filename starts with . (prevents filenames . .. and files from disappearing)
  • Remove directory separators: / on Linux, and / and : on OS X
  • Removing invalid Windows filename characters \/:*?"<>| (when on Windows, or forced with windows_safe=True)
  • Prepend reserved filenames with underscore (COM2 becomes _COM2, NUL becomes _NUL etc)
  • Optional normalisation of Unicode data, so å becomes a and non-convertable characters are removed
  • Truncation of filenames over 255 characters on Linux/Darwin, and 32 characters on Windows

The code and a bunch of test-cases can be found and fiddled with at http://gist.github.com/256270. The "production" code can be found in tvnamer/utils.py

Is there any errors with this function? Any conditions I've missed?

+1  A: 

One point I've noticed: Under NTFS, some files can not be created in specific directories. E.G. $Boot in root

Dominik Weber