views:

58

answers:

4

My website deals with pictures that users upload. I'm kind of conflicted on what my picture filename should consist of. I'm worried about scalability simply and possibly security? Maybe someone out there deals with the same thing and can tell me what their use on their site?

Currently, my filename convention is

{pictureId}_{userId}_{salt}_{variant}.{fileExt}

where salt is a token generated server-side (not sure why I decided to put this here, maybe for security purposes I don't know) and variant is something like t where it signifies it's a thumbnail. So it would look something like

12332_22_hb8324jk_t.jpg

Please advise, thanks.

+1  A: 

You might like to consider replacing the underscores with (e.g.) minuses. (Underscores are used as wildcards in SQL, so you could potentially run into trouble one day in a LIKE comparison). (And of course, underscores are just plain evil :-)

It looks form your example like you're avoiding spaces and upper-case characters - good move. I'd keep everything lowercase and use case-insensitive comparisons to eliminate any potential case-sensitivity issues with different file systems.

Scalability should be fine as long as you can cope with any number of digits in your user, picture and type IDs. You're very unlikely to hit any filename length limits with this scheme.

Security could be an issue if you use sequential IDs, as someone could potentially tweak the numbers and request a picture they shouldn't be able to access - but the salt should make it virtually impossible for someone to guess the correct filename for another picture. If users can't see/access the internal filename in any way, that may be an unnecessary measure though.

Jason Williams
Thanks for the response. I may consider using dashes, however my script files (user_profile.php) use underscores, I guess that wouldn't be related? Will accept an answer when discussion has ceased
Axsuul
+2  A: 

In addition to the previous comments, you may want to consider creating a directory hierarchy for your files. Depending on volume and the particular OS hosting the files, you can easily reach a point where you have an unreasonably large number of files in a single directory. There may be limits on the number of files allowed per folder. If you ever need to do any manual QA or maintenance on your files, this may be problematic (especially if such maintenance is not scripted).

I once worked on a project with a high volume of images. We decided to record a subpath in our database in addition to the filename of each file. Our folder names looked like this:

a/e/2/f/9
3/3/2/b/7

Essentially, we created folders 5 deep with a single hex value as the folder name. The depth was probably excessive, but effective. I suppose this could have led to us reaching a limit on the number of folders on a volume (not sure if such a limit exists).

I would also consider storing a drive in addition to a path (assuming you have a bunch of disks for storage). This way you can move images around and then update your database (assuming you have one) as part of the move.

mcliedtk
Ah cool, very insightful. I might consider doing this, but maybe it might be too premature to do this until my site gets big enough (if ever)
Axsuul
Thanks! Just to be clear, we would only generate the directories as needed. So we had a bit of code that would generate a random subpath when a file was uploaded, check to see if the directories representing the subpath existed (the 5 directories in the tree: e.g. 4/3/f/a/a), and create the directories if they did not exist. Alternatively, you could create directories off of the UserID, or by some combination of these two approaches.
mcliedtk
+1  A: 

My 2 pence worth; there is a bit of a conflict between scalability and security in this problem I would say.

  1. If you have real security concerns, then you should not rely at all on the filename of the target image : this is just security-by-obfusication - somebody could just guess the name eventually.[even with your salt idea, which makes it harder]

Instead you should at least have a login mechanism to create a session between client and server , to make sure you can only get at stuff once you have authenticated: even then stuff is sniffable: if security really is a concern , then I would say you have to use SSL.

  1. Regarding scalability : I would suggest you actually do give your images sequential numbers: and store them in 'bins' of (say) 500 images each. As you fill up a bin, create a new one. Store bin (min-image-id, max-image id) information in one DB table and image numbers in another: you can then comparitively cheaply find which bin a particular image lives in from its id. This is a fairly common solution for storing lots of docs/images.

You could then map your URLs to the bin+image id: but then to avoid the problem noted by Jason Williams (sequential numbering, makes it easy to probe), you really should address security separately as in point 1.

monojohnny
Thanks for all the responses everyone. I got alot of insight on my problem, but this gave me the most info to pursue a solution.
Axsuul
A: 

The first thing to do is to setup a directory structure that models your use case. In your case you have a user that uploads a picture. You would probably have a directory structure like this (probably on a network share somewhere):

-Pictures
  -UserID1
    -PictureID1~^~Variant.jpg
    -PictureID2~^~Variant.jpg
  -UserID2
    -PictureID1~^~Variant.jpg
    -PictureID2~^~Variant.jpg

Pictures - simply the root directory for the following.

UserID - is the database user ID.

PictureID is simply the picture ID from the database (assuming you record the filename of each uploaded picture in a database.)

~^~ - This is simply a delimitor. You can use a one character or X character sequence. I like three characters as it is easily handled with the split function and is readily distinguishable in the file name.

Sometimes I like to add the size of the picture in with the file name .256.jpg or .1024.jpg.

At any rate, all of this depends on your use case. The most important thing is setting up the directory structure properly. That will make it easier to access/serve and manage the pictures.

You can add any other information you need into the filename as long it doesn't exceed the maximum filename length on your system.