views:

755

answers:

3

I'm creating a class to store a filename. To do so, I need to know exactly which characters are invalid and exactly which characters are invalid as leading/trailing characters.

Windows Explorer trims leading and trailing white-space characters automatically when naming a file, so I need to trim the same characters when constructing a filename instance.

I thought about using string.Trim(), but it would be naive to assume the default set of characters it trims coincides exactly with the invalid leading/trailing filename characters of the OS.

Documentation for string.Trim() says that it trims the following characters by default: U+0009, U+000A, U+000B, U+000C, U+000D, U+0020, U+0085, U+00A0, U+1680, U+2000, U+2001, U+2002, U+2003, U+2004, U+2005, U+2006, U+2007, U+2008, U+2009, U+200A, U+200B, U+2028, U+2029, U+3000, U+FEFF

Unfortunately, some of the above characters are NOT invalid in a file, because they aren't in the character set returned by System.IO.Path.GetInvalidFileNameChars.

Am I then correct that string.Trim() could potentially remove VALID leading/trailing characters from a filename, therefore corrupting the filename?

What exactly are the invalid leading/trailing characters for a filename in the Windows Vista OS? I understand that they are not necessarily the same as the file system itself, since the OS can run on different file systems.

+2  A: 

Filenames can start/end in spaces. Trim will eliminate them.

File names cannot contain

/ \ : * ? " | < >
ojblass
This is exactly the answer I've seen a million times, and it upsets me. If valid filenames start and end with spaces, why does Explorer strip them off? That would make it very difficult to edit the names of such files. Documentation FAIL: http://msdn.microsoft.com/en-us/library/aa365247.asp
Triynko
I think the designers of explorer overstepped their bounds on this. Sorry if I neglected the why but I try and answer questions and not go off onto tangents. "Why does explorer strip off spaces?" is a perfectly good question to ask.
ojblass
I was just saying that those aren't a complete list of invalid characters. Path.GetInvalidFileNameChars returns more than those posted here. Explorer's rules aren't consistent with the filesystem (and probably neither with the OS). With such poor documentation, other programs are destined to fail
Triynko
Explorer strips spaces because users probably don't intend to have spaces at the beginning or end of their filenames (it's likely to lead to confusion). This isn't an OS or filesystem limitation, though.
kvb
But the fact that the API returns more than the UI shows you is absolute crap.
ojblass
And judging by the behaviors described in the next answer, it's not clear whether Explorer actually strips the spaces or just appears to. If a file is created with leading spaces, explorer won't show them, but doesn't strip them either. If I try to use explorer to add spaces, then they are removed
Triynko
I wish Explorer was consistent with the OS and allowed actual valid filenames and editing thereof. If it doesn't like a filename, it should say "Are you sure you want to pad that baby with leading spaces?", instead of just assuming we're all stupid, and failing miserably on edge cases.
Triynko
Interesting thing, max filename length is 255 characters in drive root. In subfolders, explorer preemptively determines the max file name when renaming by subtracting the lengths of the folder names and the backslashes required between them, so the path to your file will not exceed max path length.
Triynko
You are fighting a hoard of poor decisions by a large team of developers and Microsoft.
ojblass
+2  A: 

Am I then correct that string.Trim() could potentially remove VALID leading/trailing characters from a filename, therefore corrupting the filename?

Yes. Even more so on a UNIX-like system, where ' X' is a valid filename and distinct from ' x '

Charlie Martin
The only safe thing to do then is ignore Explorer's behavior, and treat all filenames as valid except those that are explicitly restricted (COM1,LPT1,etc.) or contain characters in Path.InvalidFileNameChars. I can't find solid documentation for what the OS actually enforces, but lame hints abound:(
Triynko
pretty much. Most GUI-based things have more, and different, filename conventions than the underlying file system. OS/X can be a little maddening that way too.
Charlie Martin
+1  A: 

This code runs and creates the file:

Imports System.IO
Module Module1

Sub Main()
    Dim fs As New FileStream("d:\temp\   file . foo ", FileMode.Create, _
       FileAccess.Write)
    'declaring a FileStream and creating a word document file named file with
    'access mode of writing
    Dim s As New StreamWriter(fs)
    'creating a new StreamWriter and passing the filestream object fs as argument
    s.BaseStream.Seek(0, SeekOrigin.End)
    'the seek method is used to move the cursor to next position to avoid text to be
    'overwritten
    s.WriteLine("This is an example of using file handling concepts in VB .NET.")
    s.WriteLine("This concept is interesting.")
    'writing text to the newly created file
    s.Close()
End Sub

End Module

NOTE: the actual name of the file created with the above code appear to be " file . foo". If I edit the filename in Explorer the space isn't there but when I rerun the code above, it replaces the file.

NOTE: I took the code from http://www.startvbdotnet.com/files/default.aspx and added the spaces

NOTE: I notice that Vista's Explorer rename won't let you add the spaces before or after filename, so you can make "foo . txt" but not " foo.txt " using that method.

jrcs3
I suspected such a call would succeed. I wonder how that name shows up in Explorer, and what would happen if one tried to edit it. These edge cases are just not documented; it's no wonder strange behaviors arise at such edge cases, it's never clear to the programmer to begin with.
Triynko
It shows up as " file . foo". If I try to rename it to "file . foo" it tells me that source and destination file names can't be the same.
jrcs3
It's very odd. This is the kind of behavior that creeps into software when the documentation sucks, and all I can find is answers like "/\:*?"|<> are invalid".
Triynko