views:

24

answers:

2

I'm trying to change the way we handle uploaded files before storing them to disk. We have had some problems when users from non-windows operative systems upload files with characters that are illegal in windows file names.

  1. The idealist in me tells me that file names should be made legal as close to the web layer as possible. Thus we use the same correct file name throughout the business logic and data layer. In practice this requires us to actively sanitise file names several places and then trust this later on. This is a problem as it is much more prone to programmer mistakes unless you only have one entry point for files from the web.

  2. The other option I see is wrapping the file IO using sanitation methods on file names. This is not possible to do in an invisible way as we sometimes need to store file names in the DB. If the file name is not changed until it is stored to disk the DB will contain the wrong file name. This again wouldn't matter if all calls to the file system went through the same file name sanitation methods except that in practice your operations department will want to do some scripted jobs to move files sometimes by reading file names from DB.

A way to get around option two is to return the new file name if it was changed by the sanitation. This requires the user of the method be aware of this and handle it correctly. Like this:

public static FileStream CreateFile(string filename, out string newFileName)
{
    newFileName = FileNameSanitiser.GetSanitisedFullPath(filename);
    return System.IO.File.Create(newFileName);
}

Regarding option 1 we should have only one or two file upload endpoints. This would have made this option more feasible. I think it might be worth investing time in this, but I'm not sure my manager agrees...

A: 

If the uploading of files (as well as, presumably, the re-downloading of them) is done only through your website interface, you could rename each file with a GUID (or some other unique entity) and then store the new name and the old name in your database.

Alternatively, you could store the file contents in the database itself, which would completely avoid the Windows file-naming restrictions. Note that this isn't necessarily something you want to do - there are pros and cons to both methods of file storage (i.e. disk vs. database).

MusiGenesis
+1  A: 

What we do is pretty simple - any uploaded file gets saved on FileSystem with system generated name (we use GUID) and then database table stores the generated name and actual name. So UI will display actual name for download link and actual download handler will use the same in response headers.

VinayC
That was going to be my exact answer.
Thomas James
That is not a bad solution. The only problem is that we have an existing system which is rather big with plenty of files already present. Switching to this solution would be a major change.
Polymorphix
Also I'm thinking that it might be useful to keep the file names "human readable" in case anything goes wrong somewhere and the connection between file on disk and DB file name should be lost/corrupted.
Polymorphix
For human readable names, a simple but handy algorithm is to to replace any non-alpha characters with say underscore. Al though, you still need to append some salt (random number) to guarantee the uniqueness of file names.
VinayC
@VinayC yes, this is what the GetSanitisedFullPath method above does. The question on an architecture level. Where should this be done and other possible solutions, as provided by the two answers.
Polymorphix
@Polymorphics, I would use Utility function that would do sanitization. In my case, UI layer saves the uploaded file in file-system at temp location using generated name - the business case allows multiple files to be uploaded. So UI finally calls business layer with set of uploaded file paths and actual names, that in turns pass the info to data access layer and this layer copies files to actual file store (from temp location) with freshly generated names. So for me, I would do name sanititization at this stage.
VinayC
@VinayC Thanks. This sounds like my option 2 from the question. It is what I started doing and as there are time limitations (disregarding technical debt as always :P) I'll probably end up with this as well.
Polymorphix