views:

299

answers:

4

Is there any standardized / libraried / tested way in .NET to to take an arbitrary string and mangle it in such a way that it represents a valid file name?

Rolling my own char-replace function is easy enough, but I'd like something a little more robust and resued.

+1  A: 

Have you had a look at Path.GetInvalidFileNameChars?

Found at Really Useful .NET Classes Part 1 - System.IO.Path

astander
That plus a regex is my fallback; ideally I'd like something that does the replacement as well.
Craig Walker
+11  A: 

You can use Path.GetInvalidFileNameChars to check out which characters of the string are invalid, and either convert them to a valid char such as a hyphen, or (if you need bidirectional conversion) substitute them by a escape token such as %, followed the hexadecimal representation of their unicode codes (I have actually used this technique once but don't have the code at hand right now).

EDIT: Just in case someone is interested, here is the code.

/// <summary>
/// Escapes an object name so that it is a valid filename.
/// </summary>
/// <param name="fileName">Original object name.</param>
/// <returns>Escaped name.</returns>
/// <remarks>
/// All characters that are not valid for a filename, plus "%" and ".", are converted into "%uuuu", where uuuu is the hexadecimal
/// unicode representation of the character.
/// </remarks>
private string EscapeFilename(string fileName)
{
    char[] invalidChars=Path.GetInvalidFileNameChars();

    // Replace "%", then replace all other characters, then replace "."

    fileName=fileName.Replace("%", "%0025");
    foreach(char invalidChar in invalidChars)
    {
        fileName=fileName.Replace(invalidChar.ToString(), string.Format("%{0,4:X}", Convert.ToInt16(invalidChar)).Replace(' ', '0'));
    }
    return fileName.Replace(".", "%002E");
}

/// <summary>
/// Unescapes an escaped file name so that the original object name is obtained.
/// </summary>
/// <param name="escapedName">Escaped object name (see the EscapeFilename method).</param>
/// <returns>Unescaped (original) object name.</returns>
public string UnescapeFilename(string escapedName)
{
    //We need to temporarily replace %0025 with %! to prevent a name
    //originally containing escaped sequences to be unescaped incorrectly
    //(for example: ".%002E" once escaped is "%002E%0025002E".
    //If we don't do this temporary replace, it would be unescaped to "..")

    string unescapedName=escapedName.Replace("%0025", "%!");
    Regex regex=new Regex("%(?<esc>[0-9A-Fa-f]{4})");
    Match m=regex.Match(escapedName);
    while(m.Success)
    {
        foreach(Capture cap in m.Groups["esc"].Captures)
            unescapedName=unescapedName.Replace("%"+cap.Value, Convert.ToChar(int.Parse(cap.Value, NumberStyles.HexNumber)).ToString());
        m=m.NextMatch();
    }
    return unescapedName.Replace("%!", "%");
}
Konamiman
+2  A: 

Can you provide more detail on what you mean by "generate from an arbitrary string"? Based on what your saying, it sounds like you're asking

Is there any way to take an arbitrary string and mangle it in such a way that it represents a valid file name?

If that's the case then no there is not a standard function available that I am aware of. However you could use the following which should do the trick

public static string MakeValidFileName(string name) {
  var invalid = Path.GetInvalidFileNameChars();
  var builder = new StringBuilder();
  foreach ( var cur in name ) {
    builder.Append(invalid.Contains(cur) ? '_' : cur);
  }
  return builder.ToString();
}
JaredPar
Edited the question to use your phrasing... thanks!
Craig Walker
+5  A: 

This problem is not as simple as you may think. Not only are the characters in Path.GetInvalidFileNameChars illegal, there are several filenames, such as "PRN" and "CON", that are reserved by Windows and cannot be created. Any name that ends in "." is also illegal in Windows. Moreover, there are various length limitations. Read the full list here.

If that's not enough, different filesystems have different limitations, for example ISO 9660 filenames cannot start with "-" but can contain it.

Dour High Arch
This is *exactly* why I didn't want to try rolling my own with a simple regex replacement. Thanks.
Craig Walker