views:

19930

answers:

7

I need a robust and simple way to remove illegal path and file characters from a simple string. I've used the below code but it doesn't seem to do anything, what am I missing?

using System;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";

            illegal = illegal.Trim(Path.GetInvalidFileNameChars());
            illegal = illegal.Trim(Path.GetInvalidPathChars());

            Console.WriteLine(illegal);
            Console.ReadLine();
        }
    }
}
+9  A: 

For starters, Trim only removes characters from the beginning or end of the string. Secondly, you should evaluate if you really want to remove the offensive characters, or fail fast and let the user know their filename is invalid. My choice is the latter, but my answer should at least show you how to do things the right AND wrong way:

StackOverflow question showing how to check if a given string is a valid file name. Note you can use the regex from this question to remove characters with a regular expression replacement (if you really need to do this).

sixlettervariables
I especially agree with the second advice.
OregonGhost
+1  A: 

String.Trim() only removes chars from the beginning and end of the string.

jmatthias
+41  A: 

Try something like this instead;

string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";

foreach (char c in Path.GetInvalidFileNameChars())
{
    illegal = illegal.Replace(c.ToString(), ""); 
}
foreach (char c in Path.GetInvalidPathChars())
{
    illegal = illegal.Replace(c.ToString(), ""); 
}

But I have to agree with the comments, I'd probably try to deal with the source of the illegal paths, rather than try to mangle an illegal path into a legitimate but probably unintended one.

Edit: Or a potentially 'better' solution, using Regex's.

string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";
string regexSearch = string.Format("{0}{1}",
                     new string(Path.GetInvalidFileNameChars()), 
                     new string(Path.GetInvalidPathChars()));
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
illegal = r.Replace(illegal, "");

Still, the question begs to be asked, why you're doing this in the first place.

Matthew Scharley
I don't know if I should +1 your answer for having such an ill-performing solution that will push the user away from that path, or if I should +1 your answer for it actually answering his question! :)
sixlettervariables
I wonder if regex-replace is more performant here.
Michael Stum
@Michael Stum: they get 'compiled' and should be some sort of state machine, but it would be naive to assume they are guaranteed to be any more efficient under the hood than a loop.
sixlettervariables
On something the length of a path, it probably wouldn't make that much of a difference. On a longer string, I imagine the regex would be faster though.
Matthew Scharley
+1 for being number one on my google search
WOPR
I'd stick to the non-regex solution: it's likely to be more efficient most of the time. If using the regex solution, change string.Format() to just "["+"...". If you're going to treat `illegal` as a file name without path after replacing special chars then you'd only need Path.InvalidFileNameChars().
Rory
+1  A: 

I think it is much easier to validate using a regex and specifiing which characters are allowed, instead of trying to check for all bad characters. See these links: http://www.c-sharpcorner.com/UploadFile/prasad_1/RegExpPSD12062005021717AM/RegExpPSD.aspx http://www.windowsdevcenter.com/pub/a/oreilly/windows/news/csharp_0101.html

Also, do a search for "regular expression editor"s, they help a lot. There are some around which even output the code in c# for you.

Sandor Davidhazi
+3  A: 

I use regular expressions to achieve this. First, I dynamically build the regex.

string regex = "[" + Path.GetInvalidFileNameChars() + "]";
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);

Then I just call removeInvalidChars.Replace to do the find and replace. This can obviously be extended to cover path chars as well.

Jeff Yates
That code doesn't work for some reason.
Lone Coder
Strange, it has been working for me. I'll double-check it when I get chance. Can you be more specific and explain what exactly isn't working for you?
Jeff Yates
@Jeff: It won't work (properly at the very least) because you aren't escaping the path characters properly, and some of them have a special meaning. Refer to my answer for how to do that.
Matthew Scharley
@Matthew: Good point. I didn't think of that.
Jeff Yates
+2  A: 

Throw an exception.

if ( fileName.IndexOfAny(Path.GetInvalidFileNameChars()) > -1 )
            {
                throw new ArgumentException();
            }
mirezus
+1  A: 

Here's a code snippet that should help for .NET 3 and higher.

using System.IO;
using System.Text.RegularExpressions;

public static class PathValidation
{
    private static string pathValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]+$";
    private static Regex pathValidator = new Regex(pathValidatorExpression, RegexOptions.Compiled);

    private static string fileNameValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]+$";
    private static Regex fileNameValidator = new Regex(fileNameValidatorExpression, RegexOptions.Compiled);

    private static string pathCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]";
    private static Regex pathCleaner = new Regex(pathCleanerExpression, RegexOptions.Compiled);

    private static string fileNameCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]";
    private static Regex fileNameCleaner = new Regex(fileNameCleanerExpression, RegexOptions.Compiled);

    public static bool ValidatePath(string path)
    {
        return pathValidator.IsMatch(path);
    }

    public static bool ValidateFileName(string fileName)
    {
        return fileNameValidator.IsMatch(fileName);
    }

    public static string CleanPath(string path)
    {
        return pathCleaner.Replace(path, "");
    }

    public static string CleanFileName(string fileName)
    {
        return fileNameCleaner.Replace(fileName, "");
    }
}
James