What is the best way storing binary or image files?
- Database System
- File System
Would you please explain, why?
What is the best way storing binary or image files?
Would you please explain, why?
I like storing images in a database. It makes it easy to switch from development to production just by changing databases (no copying files). And the database can keep track of properties like created/modified dates just as well as the File System.
I personally never store images IN the database for performance purposes. In all of my sites I have a "/files" folder where I can put sub-folders based on what kind of images i'm going to store. Then I name them on convention.
For example if i'm storing a profile picture, I'll store it in "/files/profile/" as profile_2.jpg (if 2 is the ID of the account). I always make it a rule to resize the image on the server to the largest size I'll need, and then smaller ones if I need them. So I'd save "profile_2_thumb.jpg" and "profile_2_full.jpg".
By creating rules for yourself you can simply in the code call img src="/files/profile__thumb.jpg"
Thats how I do it anyway!
Pros of Storing binary files in a DB:
Cons of Storing binary files in a DB:
Pros of Storing binary files in file system:
Cons of Storing binary files in file system:
On balance I would use the file system. In the past, using SQL Server 2005 I would simply store a 'pointer' in db tables to the binary file. The pointer would typically be a GUID.
Here's the good news if you are using SQL Server 2008 (and maybe others - I don't know): there is built in support for a hybrid solution with the new VARBINARY(MAX) FILESTREAM data type. These behave logically like VARBINARY(MAX) columns but behind the scenes, SQL Sever 2008 will store the data in the file system.
There is no best way.
What? You need more info?
There are three ways I know of... One, as byte arrays in the database. Two, as a file with the path stored in the database. Three, as a hybrid (only if DB allows, such as with the FileStream type).
The first is pretty cool because you can query and get your data in the same step. Which is always nice. But what happens when you have LOTS of files? Your database gets big. Now you have to deal with big database maintenance issues, such as the trials of backing up databases that are over a terabyte. And what happens if you need outside access to the files? Such as type conversions, mass manipulation (resize all images, appy watermarks, etc)? Its much harder to do than when you have files.
The second is great for somewhat large numbers of files. You can store them on NAS devices, back them up incrementally, keep your database small, etc etc. But then, when you have LOTS of files, you start running into limitations in the file system. And if you spread them over the network, you get latency issues, user rights issues, etc. Also, I take pity on you if your network gets rearranged. Now you have to run massive updates on the database to change your file locations, and I pity you if something screws up.
Then there's the hybrid option. Its almost perfect--you can get your files via your query, yet your database isn't massive. Does this solve all your problems? Probably not. Your database isn't portable anymore; you're locked to a particular DBMS. And this stuff isn't mature yet, so you get to enjoy the teething process. And who says this solves all the different issues?
Fact is, there is no "best" way. You just have to determine your requirements, make the best choice depending on them, and then suck it up when you figure out you did the wrong thing.
There is no real best way, just a bunch of trade offs.
Database Pros:
1. Much easier to deal with in a clustering environment.
2. No reliance on additional resources like a file server.
3. No need to set up "sync" operations in load balanced environment.
4. Backups automatically include the files.
Database Cons:
1. Size / Growth of the database.
2. Depending on DB Server and your language, it might be difficult to put in and retrieve.
3. Speed / Performance.
4. Depending on DB server, you have to virus scan the files at the time of upload and export.
File Pros:
1. For single web/single db server installations, it's fast.
2. Well understood ability to manipulate files. In other words, it's easy to move the files to a different location if you run out of disk space.
3. Can virus scan when the files are "at rest". This allows you to take advantage of scanner updates.
File Cons:
1. In multi web server environments, requires an accessible share. Which should also be clustered for failover.
2. Additional security requirements to handle file access. You have to be careful that the web server and/or share does not allow file execution.
3. Transactional Backups have to take the file system into account.
The above said, SQL 2008 has a thing called FILESTREAM which combines both worlds. You upload to the database and it transparently stores the files in a directory on disk. When retrieving you can either pull from the database; or you can go direct to where it lives on the file system.