views:

179

answers:

4

I am working on a content management application in which the data being stored on the database is extremely generic. In this particular instance a container has many resources and those resources map to some kind of digital asset, whether that be a picture, a movie, an uploaded file or even plain text.

I have been arguing with a colleague for a week now because in addition to storing the pictures, etc - they would like to store the text assets on the file system and have the application look up the file location(from the database) and read in the text file(from the file system) before serving to the client application.

Common sense seemed to scream at me that this was ridiculous and if we are bothering to look up something from the database, we might as well store the text in a database column and have it served along up with the row lookup. Database lookup + File IO seemed sounds uncontrollably slower then just Database Lookup. After going back and forth for some time, I decided to run some benchmarks and found the results a little surprising. There seems to be very little consistency when it comes to benchmark times. The only clear winner in the benchmarks was pulling a large dataset from the database and iterating over the results to display the text asset, however pulling objects one at a time from the database and displaying their text content seems to be neck and neck.

Now I know the limitations of running benchmarks, and I am not sure I am even running the correct idea of "tests" (for example, File system writes are ridiculously faster then database writes, didn't know that!). I guess my question is for confirmation. Is File I/O comparable to database text storage/lookup? Am I missing a part of the argument here? Thanks ahead of time for your opinions/advice!

A quick work about what I am using: This is a Ruby on Rails application, using Ruby 1.8.6 and Sqlite3. I plan on moving the same codebase to MySQL tomorrow and see if the benchmarks are the same.

+1  A: 

I think your benchmark results will depend on how you store the text data in your database. If you store it as LOB then behind the scenes it is stored in an ordinary file. With any kind of LOB you pay the Database lookup + File IO anyway.

VARCHAR is stored in the tablespace

Ordinary text data types (VARCHAR et al) are very limited in size in typical relational database systems. Something like 2000 or 4000 (Oracle) sometimes 8000 or even 65536 characters. Some databases support long text but these have serious drawbacks and are not recommended.

LOBs are references to file system objects

If your text is larger you have to use a LOB data type (e.g. CLOB in Oracle).

LOBs usually work like this: The database stores only a reference to a file system object. The file system object contains the data (e.g. the text data). This is very similar to what your colleague proposes except the DBMS lifts the heavy work of managing references and files.

The bottom line is: If you can store your text in a VARCHAR then go for it. If you can't you have two options: Use a LOB or store the data in a file referenced from the database. Both are technically similar and slower than using VARCHAR.

Ludwig Weinzierl
Since the database we will be using is mysql5 that is the documentation I am pouring over. This page: http://dev.mysql.com/doc/refman/5.1/en/char.html seems to indicate the maximum character length could theoretically be set at 65,535 ... which sounds well beyond the lengths you specify as dangerous ... although those 65,535 are bytes and I am not sure what the conversion is when it comes to unicode, which I need to look into. Do you think strings this long in a non-LOB (text field in MySQL) could be dangerous?
BushyMark
I wondered how unicode is treated myself. If it is stored as utf-8 then one character is 4 bytes at maximum, 1 byte per character is typical for English text. One danger I see with really large VARCHARS in MySQL is that it can fill up the maximum row size (65,535 bytes, which is shared among all columns).For a performace comparison between TEXT and VARCHAR in MySQL see http://forums.mysql.com/read.php?24,105964,105964
Ludwig Weinzierl
+1  A: 

The major advantage you'll get from not using the filesystem is that the database will manage concurrent access properly. Let's say 2 processes need to modify the same text as the same time, synchronisation with the filesystem may lead to race conditions, whereas you will have no problem at all with everyhing in database.

Arnaud
This is great information, thanks!
BushyMark
A: 

I did this before. Its a mess, you need to keep the filesystem and the database synchronized all the time, so that makes the programming more complicated, as you would guess. My advice is either go for an all filesystem solution, or all database solution, depending on the data. Notably, if you require lots of searches, conditional data retrieval, then go for database, otherwise fs. Note that database may not be optimized for storage of large binary files. Still, remember, if you use both, youre gonna have to keep them synchronized, and it doesnt make for an elegant nor enjoyble (to program) solution. Good luck!

rolfen
A: 

At least, if your problems come from the "performance side", you could use a "no SQL" storage solution like Redis (via Ohm, for example), or CouchDB...

Carlo Pecchia