tags:

views:

1549

answers:

3

I know this is something of a "classic question", but does the mysql/grails (deployed on Tomcat) put a new spin on considering how to approach storage of user's uploaded files.

I like using the database for everything (simpler architecture, scaling is just scaling the database). But using the filesystem means we don't lard up mysql with binary files. Some might also argue that apache (httpd) is faster than Tomcat for serving up binary files, although I've seen numbers that actually show just putting Tomcat on the front of your site can be faster than using an apache (httpd) proxy.

How should I choose where to place user's uploaded files?

Thanks for your consideration, time and thought.

+4  A: 

I don't know if one can make general observations about this kind of decision, since it's really down to what you are trying to do and how high up the priority list NFRs like performance and response time are to your application.

If you have lots of users, uploading lots of binary files, with a system serving large numbers of those uploaded binary files then you have a situation where the costs of storing files in the database include:

  • Large size binary files
  • Costly queries

Benefits are

  • Atomic commits
  • Scaling comes with database (though w MySQL there are some issues w multinode etc)
  • Less fiddly and complicated code to manage file systems etc

Given the same user situation where you store to the filesystem you will need to address

  • Scaling
  • File name management (user uploads same name file twice etc)
  • Creating corresponding records in DB to map to the files on disk (and the code surrounding all that)
  • Looking after your apache configs so they serve from the filesystem

We had a similar problem to solve as this for our Grails site where the content editors are uploading hundreds of pictures a day. We knew that driving all that demand through the application when it could be better used doing other processing was wasteful (given that the expected demand for pages was going to be in the millions per week we definitely didn't want images to cripple us).

We ended up creating upload -> file system solution. For each uploaded file a DB meta-data record was created and managed in tandem with the upload process (and conversely read that record when generating the GSP content link to the image). We served requests off disk through Apache directly based on the link requested by the browser. But, and there is always a but, remember that with things like filesystems you only have content per machine.

We had the headache of making sure images got re-synchronised onto every server, since unlike a DB which sits behind the cluster and enables the cluster behave uniformly, files are bound to physical locations on a server.

Another problem you might run up against with filesystems is folder content size. When you start having folders where there are literally tens of thousands of files in them, the folder scan at the OS level starts to really drag. To avert this problem we had to write code which managed image uploads into yyyy/MM/dd/image.name.jpg folder structures, so that no one folder accumulated hundreds of thousands of images.

What I'm implying is that while we got the performance we wanted by not using the DB for BLOB storage, that comes at the cost of development overhead and systems management.

j pimmel
+1  A: 

Just as an additional suggestion: JCR (eg. Jackrabbit) - a Java Content Repository. It has several benefits when you deal with a lot of binary content. The Grails plugin isn't stable yet, but you can use Jackrabbit with the plain API.

Siegfried Puchbauer
Ooh, that sounds nice... :)
j pimmel
A: 

Another thing to keep in mind is that if your site ever grows beyond one application server, you need to access the same files from all app servers. Now all app servers have access to the database, either because that's a single server or because you have a cluster. Now if you store things in the file system, you have to share that, too - maybe NFS.

Karsten Silz