views:

141

answers:

3

I have data coming from an nvarchar field of the SQL server database via EF3.5. This string is used to create a Filename and need to remove invalid characters and tried following options but none of them works. Please suggest why this is such an understandable mystery? Am I doing anything wrong?

I went though almost all of the related questions on this site.. and now posting a consolidated question from all the suggestions/answers from other similar questions.

UPD: The Issue was unrelated..All of these options do work. So posting it to community wiki.

public static string CleanFileName1(string filename)
{            
    string file = filename;                                            
    file = string.Concat(file.Split(System.IO.Path.GetInvalidFileNameChars(), StringSplitOptions.RemoveEmptyEntries));

    if (file.Length > 250)
    {
        file = file.Substring(0, 250);
    }
    return file;
 }

public static string CleanFileName2(string filename)
{
    var builder = new StringBuilder();
    var invalid = System.IO.Path.GetInvalidFileNameChars();
    foreach (var cur in filename)
    {
        if (!invalid.Contains(cur))
        {
            builder.Append(cur);
        }
    }
    return builder.ToString();
}

public static string CleanFileName3(string filename)
{                                    
    string regexSearch = string.Format("{0}{1}",
        new string(System.IO.Path.GetInvalidFileNameChars()),
        new string(System.IO.Path.GetInvalidPathChars()));
    Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
    string file = r.Replace(filename, "");

    return file;
}       

public static string CleanFileName4(string filename)
{
    return new String(filename.Except(System.IO.Path.GetInvalidFileNameChars()).ToArray());
}   

public static string CleanFileName5(string filename)
{            
    string file = filename;

    foreach (char c in System.IO.Path.GetInvalidFileNameChars())
    {
        file = file.Replace(c, '_');
    }                                 
    return file;
}   
+1  A: 

Try this

filename = Regex.Replace(filename, "[\/?:*""><|]+", "", RegexOptions.Compiled)

DJ Quimby
@DJ .. Same issue with this too... works for regular strings but not for the strings coming from nvarchar field of the database
Bhuvan
+1  A: 

no invalid chars returned by System.IO.Path.GetInvalidFileNameChars() being removed. – Bhuvan 5 mins ago

The first method you posted works OK for the characters in Path.GetInvalidFileNameChars(), here it is at work:

static void Main(string[] args)
{
    string input = "abc<def>ghi\\1234/5678|?9:*0";

    string output = CleanFileName1(input);

    Console.WriteLine(output); // this prints: abcdefghi1234567890

    Console.Read();
}

I suppose though that your problem is with some language-specific special characters. You can try to troubleshoot this problem by printing out the ASCII codes of the characters in your string:

string stringFromDatabase = "/5678|?9:*0"; // here you get it from the database

foreach (char c in stringFromDatabase.ToCharArray())
    Console.WriteLine((int)c);

and consulting the ASCII table: http://www.asciitable.com/

I again suspect that you'll see characters with codes larger than 128, and you should exclude those from your string.

Dan Dumitru
This works for normal strings like that, but not the string is coming from an nvarchar field of the database.
Bhuvan
Can you copy and paste the string you're receiving from your database as a comment?
DJ Quimby
"fbo test investor 12/30/92"In this string I am trying to remove the / and it doesn't remove them. But when I try the same thing..from immediate window.. by pasting just string. It removes those char.
Bhuvan
@Bhuvan - It's possible that, in the string from the DB, you have other characters that you don't see... Try printing the ASCII codes of every character in the string, as I showed in my answer, and see what you get.
Dan Dumitru
How are you pulling the data out of the database? Any chance you can format the date differently so that the '/'s aren't in the date? Something like CONVERT(VarChar(50), GETDATE(), 102)
DJ Quimby