views:

46

answers:

1

I am working on a project where I need to store many user uploaded files and provide redundancy. For file uploads, I was first considering to place user uploaded files in a NFS "uploads" directory and use Gearman to grab the file, move it to it's permanent storage locations, then update MySQL with the files info (filesize, date...etc).

Now that I have learned of MongoDB's GridFS, I am thinking this may be a better approach than NFS & Gearman. With GridFS, I don't need NFS and the redundancy is built in - but I would be putting a lot of trust into MongoDB.

With this all said, I was wondering if anyone had any thoughts? Does anyone know of a large deployment of GridFS? I know many people are using MongoDB for NoSQL, but I have not found many references to GridFS deployments.

Thanks!

+1  A: 

We're using GridFS for one of our projects. I'm also using gearman to distribute the job queue so that clients don't have to wait for a file to finish uploading. Once the user hits submit, php sends the job to gearman and then I have "workers" that do the inserting into MongoDB.

luckytaxi
If you do not mind me asking, how much data are you storing in GridFS and across how many machines?
Ethan
For the project w/o gearman (not sure if developers will implement gearman), which is a file sharing application, we anticipate at least a TB of data. I'm in the process of getting quotes for physical hardware to build our the sharding environment. You'll want to shard when your data gets pretty big. Right now we're using one machine but I plan on having at least 4 machines to handle the sharding.
luckytaxi
We are in the early development stage and are trying to select an approach. I have on my list to better understand how MongoDB shardes GridFS across machines. My main question is, if I have 4 machines, can I instruct MongoDB to always have the data on 2 of the 4 machines for redundancy. Also, if later I want to increase this to 3 of the 4, can I easily?
Ethan
Each shard comprise of a replica set (minimum of 3 servers). I believe you also need additional servers for the config and routing process. I'm thinking a total of 5 servers. To answer your question, you add more shards to your environment to handle the load. http://www.mongodb.org/display/DOCS/Sharding+Introduction
luckytaxi