It is inspired by "How to make a valid Windows filename from an arbitrary string?", I've written a function that will take arbitrary string and make it a valid filename.
My function should technically be an answer to this question, but I want to make sure I've not done anything stupid, or overlooked anything, before posting it as an answer.
I wrote this as part of tvnamer - a utility which takes TV episode filenames, and renames them nice and consistently, with an episode pulled from http://www.thetvdb.com - while the source filename must be a valid file, the series name is corrected, and the episode name - so both could contain theoretically any characters. I'm not so much concerned about security as usability - it's mainly to prevent files being renamed .some.series - [01x01].avi
and the file "disappearing" (rather than to thwart evil people)
It makes a few assumptions:
- The filesystem supports Unicode filenames. HFS+ and NTFS both do, which will cover a majority of users. There is also a
normalize_unicode
argument to strip out Unicode characters (in tvnamer, this is set via the config XML file) - The platform is either Darwin, Linux, and everything else is treated as Windows
- The filename is intended to be visible (not a dotfile like
.bashrc
) - it would be simple enough to modify the code to allow.abc
format filenames, if desired
Things I've (hopefully) handled:
- Prepend underscore if filename starts with
.
(prevents filenames.
..
and files from disappearing) - Remove directory separators:
/
on Linux, and/
and:
on OS X - Removing invalid Windows filename characters
\/:*?"<>|
(when on Windows, or forced withwindows_safe=True
) - Prepend reserved filenames with underscore (
COM2
becomes_COM2
,NUL
becomes_NUL
etc) - Optional normalisation of Unicode data, so
å
becomesa
and non-convertable characters are removed - Truncation of filenames over 255 characters on Linux/Darwin, and 32 characters on Windows
The code and a bunch of test-cases can be found and fiddled with at http://gist.github.com/256270. The "production" code can be found in tvnamer/utils.py
Is there any errors with this function? Any conditions I've missed?