views:

73

answers:

2

I am playing with MongoDB since yesterday and absolutely love it. I am trying to import lots of data (2 billion rows) and index it but it doesn't seem to be using the 8 cores that my system has adn the import is going at normal rates (60000 records/sec). I can only imagine how long it might take to index two columns in this collection. Are there any MondoDB type databases that exploit multicore nature of CPUs?

+4  A: 

If MongoDB has an achilles heel it's the fact that it only supports single-threaded writes and single-threaded map-reduces.

As always, there are trade-offs here. Single-threaded writes are the simplest way to avoid lock problems and minimize overhead. In the same fashion multi-threaded map-reduces are a great way to lock your data. So single-threaded map-reduces on a production system are probably easier and safer.

However, you're not without tools here. MongoDB will provide one write thread to each instance. So if you shard MongoDB, then you'll get one write thread for each shard.

If you want multiple indexes on 2 billion rows, you'll want to look at sharding anyways. Some quick math here: MongoID is 12 bytes. Index on MongoID will be 2B * 12 bytes = 22GB+. If you now want to add two more indexes (even just two 4-byte integers) we're talking about 7.5GB for each.

So at 2B rows, you're talking about having over 37GBs in indexes (minimum). On most 8-core servers, that means that you won't even be able to keep you indexes in memory, let alone any of the data.

So if you want serious performance here, you'll want to start looking at sharding. Just based on the general numbers. FWIW, MySQL would be no more adept at handling 2B documents. With that much data, you're really going to want multiple servers to keep up with the load.

Gates VP
@Gates: I guess sharding is the only way to go this time. Thank you for your suggestions. I am indexing in both MySQL and MongoDB. Will use whichever completes first :) Also a good benchmarking exercise...
Legend