views:

424

answers:

4

I have a web application that stores a lot of user generated files. Currently these are all stored on the server filesystem, which has several downsides for me.

  • When we move "folders" (as defined by our application) we also have to move the files on disk (although this is more due to strange design decisions on the part of the original developers than a requirement of storing things on the filesystem).
  • It's hard to write tests for file system actions; I have a mock filesystem class that logs actions like move, delete etc, without performing them, which more or less does the job, but I don't have 100% confidence in the tests.
  • I will be adding some other jobs which need to access the files from other service to perform additional tasks (e.g. indexing in Solr, generating thumbnails, movie format conversion), so I need to get at the files remotely. Doing this over network shares seems dodgy...
  • Dealing with permissions on the filesystem as sometimes given us problems in the past, although now that we've moved to a pure Linux environment this should be less of an issue.

So, my main questions are

  • What are the downsides of storing files as BLOBs in MySQL?
  • Do the same problems exist with NoSQL systems like Cassandra?
  • Does anyone have any other suggestions that might be appropriate, e.g. MogileFS, etc?
+1  A: 

maybe a hybrid solution.

Use a database to store metadata about each file - and use the file system to actually store the file.

any restructuring of 'folders' could be modelled in the DB and dereferenced from the actual OS location.

Randy
That is currently what we do; restructuring of folders should, ideally, be completely dereferenced from the actual filesystem location, but the old developers went out of their way to link it instead...So I'm faced with a rewrite to some extend anyway, and I'm wondering if there is a suitable approach which will completely avoid the filesystem.
El Yobo
A: 

If the OS or application doesn't need access to the files, then there's no real need to store the files on the file system. If you want to backup the files at the same time you backup the database, then there's less benefit to storing them outside the database. Therefore, it might be a valid solution to store the files in the database.

An additional downside is that processing files in the db has more overhead than processing files at the file system level. However, as long as the advantages outweigh the downsides, and it seems that it might in your case, you might give it a try.

My main concern would be managing disk storage. As your database files get large, managing your entire database gets more complicated. You don't want to move out of the frying pan and into the fire.

Marcus Adams
I'm not so concerned about disk space; it's crazy cheap these days, I can just add more drives and RAID them if necessary.My concern with mysql is mainly related to caching; if I run a query that returns BLOBS, it seems that this would take up a large amount of the cache, clearing out other more useful data. I suspect that there must be other problems as well, otherwise more people would do it that way, but I'm not sure what they are.
El Yobo
I've read a lot on this topic, and no one has stated query cache issues as a reason not to store files in the database. With MySQL, you can set the query_cache_limit value, which indicates the maximum result set size to cache. The default is 1 MB. As an alternative solution that might resolve the issues you are having with the file system, you might also look at an NFS (a file server). You could store references to the files in the db.
Marcus Adams
True, limiting the size of the things to be stored in the query cache would probably reduce my concern here.Storing file system references is still a pain, but looks like it's the best way.
El Yobo
+1  A: 

Not a direct answer but some pointers to very interesting and somehow similar questions (yeah, they are about blobs and images but this is IMO comparable).

What are the downsides of storing files as BLOBs in MySQL?

Do the same problems exist with NoSQL systems like Cassandra?

PS: I don't want to be the killjoy but I don't think that any NoSQL solution is going to solve your problem (NoSQL is just irrelevant for most businesses).

Pascal Thivent
Thanks, looks like a very useful set of links. Storing images/blobs of any sort is what I'm after (we're storing all sorts of stuff).
El Yobo
Thanks, your suggested links are great. Clearly I need to search harder before asking questions :)In conclusion, avoiding the DB looks like the way to go. I just need to decouple the application from the filesystem somewhat so that it's less painful...
El Yobo
Glad you found them useful. And I share this conclusion.
Pascal Thivent
A: 

You can store files up to 2GB easily in Cassandra by splitting them into 1MB columns or so. This is pretty common.

You could store it as one big column too, but then you'd have to read the whole thing into memory when accessing it.

jbellis