tags:

views:

609

answers:

3

What is the most cross platform way of removing bad path characters (e.g. "\" or ":" on Windows) in Python?

Solution

Because there seems to be no ideal solution I decided to be relatively restrictive and did use the following code:

def remove(value, deletechars):
    for c in deletechars:
        value = value.replace(c,'')
    return value;

print remove(filename, '\/:*?"<>|')
A: 

That character is in os.sep, it'll be "\" or ":", depending on which system you're on.

eduffy
That doesn't include :"%/<>^|?, which are also illegal file characters in Windows.
ephemient
+2  A: 

If you are using python try os.path to avoid cross platform issues with paths.

Macarse
Which part of `os.path` helps with determining legal filenames? `.supports_unicode_filenames` maybe a little, but that's not enough.
ephemient
+2  A: 

Unfortunately, the set of acceptable characters varies by OS and by filesystem.

  • Windows:

    • Use almost any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
      • The following reserved characters are not allowed:
        < > : " / \ | ? *
      • Characters whose integer representations are in the range from zero through 31 are not allowed.
      • Any other character that the target file system does not allow.

    The list of accepted characters can vary depending on the OS and locale of the machine that first formatted the filesystem.

    .NET has GetInvalidFileNameChars and GetInvalidPathChars, but I don't know how to call those from Python.

  • Mac OS: NUL is always excluded, "/" is excluded from POSIX layer, ":" excluded from Apple APIs
    • HFS+: any sequence of non-excluded characters that is representable by UTF-16 in the Unicode 2.0 spec
    • HFS: any sequence of non-excluded characters representable in MacRoman (default) or other encodings, depending on the machine that created the filesystem
    • UFS: same as HFS+
  • Linux:
    • native (UNIX-like) filesystems: any byte sequence excluding NUL and "/"
    • FAT, NTFS, other non-native filesystems: varies

Your best bet is probably to either be overly-conservative on all platforms, or to just try creating the file name and handle errors.

ephemient