views:

537

answers:

6

I am struggling to decide if I should be using the MySQL blob field type in an upcoming project I have.

My basic requirements are, there will be certain database records that can be viewed and have multiple files uploaded and "attached" to those records. Seeing said records can be limited to certain people on a case by case basis. Any type of file can be uploaded with virtually no restriction.

So looking at it one way, if I go the MySQL route, I don't have to worry about virus's creeping up or random php files getting uploaded and somehow executed. I also have a much easier path for permissioning and keeping data tied close to a record.

The other obvious route is storing the data in a specific folder structure outside of the webroot. in this case I'd have to come up with a special naming convention for folders/files to keep track of what they reference inside the database.

Is there a performance hit with using MySQL blob field type? I'm concerned about choosing a solution that will hinder future growth of the website as well as choosing a solution that wont be easy to maintain.

+2  A: 

Large volumes of data will eventually take their toll on performance. MS SQL 2008 has a specialized way of storing binary data in the file system:

http://msdn.microsoft.com/en-us/library/cc949109.aspx

I would employ the similar approach too for your project too.

You can create a FILES table that will keep information about files such as original names for example. To safely store files on the disk rename them using for example GUIDs. Store new file names in your FILES table and when user needs to download it you can easily locate it on disk and stream it to user.

see me no more
+5  A: 

If your web server will be serving these uploaded files over the web, the performance will almost certainly be better if they are stored on the filesystem. The web server will then be able to apply HTTP caching hints such as Last-Modified and ETag which will help performance for users accessing the same file multiple times. Additionally, the web server will automatically set the correct Content-Type for the file when serving. If you store blobs in the database, you'll end up implementing the above mentioned features and more when you should be getting them for free from your web server.

Additionally, pulling large blob data out of your database may end up being a performance bottleneck on your database. Also, your database backups will probabaly be slower because they'll be backing up more data. If you're doing ad-hoc queries during development, it'll be inconvenient seeing large blobs in result sets for select statements. If you want to simply inspect an uploaded file, it will be inconvenient and roundabout to do so because it'll be awkwardly stored in a database column.

I would stick with the common practice of storing the files on the filesystem and the path to the file in the database.

Asaph
A: 

In my opinion storing files in database is bad idea. What you can store there is id, name, type, possibly md5 hash of file, and date inserted. Files can be uploaded in to folder outside public location. Also you should be concern that it is not advised to keep more than 1000 files in one folder. So what you have to create new folder each time file id is increased by 1000.

Nazariy
+2  A: 

Is there a performance hit with using MySQL blob field type?

Not inherently, but if you have big BLOBs clogging up your tables and memory cache that will certainly result in a performance loss.

The other obvious route is storing the data in a specific folder structure outside of the webroot. in this case I'd have to come up with a special naming convention for folders/files to keep track of what they reference inside the database.

Yes, this is a common approach. You'd usually do something like have folders named after each table they're associated with, containing filenames based only on the primary key (ideally a integer; certainly never anything user-submitted).

The question is then, do you get the web server to serve it by pointing an Alias at the folder?

  • + is super fast
  • + caches well
  • - needs appropriate file extension to return desired Content-Type (server deployment issue)
  • - needs Header set Content-Disposition attachment (Apache) or other method to stop IE sniffing for HTML

or do you serve the file manually by having a server-side script spit it out, as you would have to serving from a MySQL blob?

  • - is slow
  • - needs a fair bit of manual If-Modified-Since and ETag handling to cache properly
  • + can use application's own access control methods
  • + easy to add correct Content-Type and Content-Disposition headers from the serving script

This is a trade-off there's not one globally-accepted answer for.

bobince
A: 

Many people recommend against storing file attachments (usually this applies to images) in blobs in the database. Instead they prefer to store a pathname as a string in the database, and store the file somewhere safe on the filesystem. There are some merits to this:

  • Database and database backups are smaller.
  • It's easier to edit files on the filesystem if you need to work with them ad hoc.
  • Filesystems are good at storing files. Databases are good at storing tuples. Let each one do what it's good at.

There are counter-arguments too, that support putting attachments in a blob:

  • Deleting a row in a database automatically deletes the associated attachment.
  • Rollback and transaction isolation work as expected when data is in a row, but not when some part of the data is on the filesystem.
  • Backups are simpler if all data is in the database. No need to worry about making consistent backups of data that's changing concurrently during the backup procedure.

So the best solution depends on how you're going to be using the data in your application. There's no one-size-fits-all answer.

I know you tagged your question with MySQL, but if folks reading this question use other brands of RDBMS, they might want to look into BFILE when using Oracle, or FILESTREAM when using Microsoft SQL Server 2008. These give you the ability store files outside the database but access them like they're part of a row in a database table (more or less).

Bill Karwin
A: 

Regarding the performance issue. Say I want to create a service for selling ebooks or music wouldnt it be better to store the files in the BLOB field, in order to prevent users from getting access to the files via a direct URL? or is there a much better and elegant solution, because with a database of 100,000 MP3s at an average of 4MB, there should definitely be a performance hit. Kindly advise.

generalSpecific