views:

330

answers:

1

We've got a file-based program we want to convert to use a document database, specifically MongoDB. Problem is, MongoDB is limited to 2GB on 32-bit machines (according to http://www.mongodb.org/display/DOCS/FAQ#FAQ-Whatarethe32bitlimitations%3F), and a lot of our users will have over 2GB of data. Is there a way to have MongoDB use more than one file somehow?

I thought perhaps I could implement sharding on a single machine, meaning I'd run more than one mongod on the same machine and they'd somehow communicate. Could that work?

+3  A: 

The only way to have more than 2GB on a single node is to run multiple mongod processes. So sharding is one option (like you said) or doing some manual partitioning across processes.

mdirolf
Would sharding by running multiple processes on a single machine even work though?
configurator
@mdirolf How could increased number of mongod processes (on one physical server) change the picture, if 32-bit OS still can address only a limited amount of memory? Shrading may help, if only the shards are located on different hosts (but total storage size for the server can't exceed the 2GB limit anyway).
Vasil Remeniuk
I think the problem in using memory-mapped file is that a process with 32-bit pointers can't point to data beyond that range - not that the OS can't open files.
configurator
Yep, that's what I meant. It's about the RAM that can be addressed by OS (on 32-bit systems it's limited to 4gb, AFAIK).
Vasil Remeniuk
Using memory-mapped files doesn't mean the entire file needs to be in RAM - it just means you only get the 32-bit address space inside the file.
configurator
@Vasil yup, exactly what configurator said - each process has its own address space so if you have multiple processes each will be able to address ~2.5GB.
mdirolf
Thank you both, guys. I've glanced over some articles about memory-mapped files and virtual memory, and now it's much more clear. I agree that theoretically it should be possible to start several instances of MongoDB to share one virtual memory space. What confuses me, is that MongoDB wiki doesn't say a word about sharding as a way to workaround 32-bit limitations, and as for me, it means that it's, at least, not the setup recommended for production use. The other question is, how many instances of MongoDB can work effectively sharing resources of one machine? 2GB is a very low threshold...
Vasil Remeniuk
Yeah we don't recommend it because it's probably overly complex (and sharding isn't in a production release yet). The recommendation is really just to find a 64 bit machine to deploy on (I know this isn't an option for some people, though).
mdirolf