views:

120

answers:

2

Howdie stackoverflow people!

So I've been doing some digging regarding these NoSQL databases, MongoDB, CouchDB etc. Though I am still not sure about real time-ish stuff therefore I thought i'd ask around to see if someone have any practical experience.

Let's think about web stuff, let's say we've got a very dynamic super ajaxified webapp that asks for various types of data every 5-20 seconds, our backend is python or php or anything other than java really... in cases such as these obviously a MySQL or similar db would be under heavy pressure (with lots of users), would MongoDB / CouchDB run this without breaking a sweat and without the need to create some super ultra complex cluster/caching etc solution?

Yes, that's basically my question, if you think that no.. then yes I know there are several types of solutions for this, nodeJS/websockets/antigravity/worm-hole super tech, but I am just interested in these NoSQL things atm and more specifically if they can handle this type of thing.

Let's say we have 5000 users at the same time, every 5, 10 or 20 seconds ajax requests that updates various interfaces.

Shoot ;]

A: 

It depends heavily on the server running said NoSQL solution, amount of data etc... I have played around with Mongo a bit and it is very easy to setup multiple servers to run simultaneously and you would most likely be able to accomplish high concurrency by starting multiple instances on the same box and having them act like a cluster. Luckily Mongo, at least, handles all the specifics so servers can be killed and introduced without skipping a beat (depending on version). By default I believe the max connections is 1000 so starting 5 servers with said configuration would suffice (if your server can handle it obviously) but realistically you would most likely never be hitting 5000 users at the exact same time.

I hope for your hardware's sake you would at least come up with a solution that can check to see if new data is available before a full-on fetch. Either via timestamps or Memcache etc...

Overall I would tend to believe NoSQL would be much faster than traditional databases assuming you are fetching data and not running reports etc... and your datastore design is intelligent enough to compensate for the lack of complex joins.

methodin
Yeah the design here is important, luckily the type of things I am doing does not require any complex joins, which is why I think MongoDB could be perfect.
quiggle
It's definitely a sweet technology. The hardest thing about it is understanding how to structure your data - especially coming from a strong background in RDMS. Once that is discovered the rest is cake.
methodin
Yeah that's what I am trying to wrap my head around right now hehe, but I see there are a few different ORMs well "ODMs" out there for MongoDB, any better than the other?
quiggle
I tend to not use ORMs. Especially not for Mongo since it's already providing data in JSON - just need to call json_decode on the result and you got yourself a PHP array. Very nice. There do seem to be quit a bit of options though. Doctrine would be an obvious choice but I imagine there are some more lightweight projects out there as well. If you find a nice one let us know!
methodin
You'll find that ORM / ODM are really a mixed bag. The problem with "ORM" is the very concept that you need to "map a relation" in a DB with no joins. That said, I think it's fair to wrap "objects" with a class for more consistent handling. (but that's typically all you need) Personally, for PHP, I built my own: http://github.com/gatesvp/MonogoModel (still working on docs, msg me for a quick intro, the code is used in big prod system)
Gates VP
Sweet will definitely check it out, thanks Gates :)
quiggle
+2  A: 

Let's say we have 5000 users at the same time, every 5, 10 or 20 seconds ajax requests that updates various interfaces.

OK, so to get this right, you're talking about 250 to 1000 writes per second? Yeah, MongoDB can handle that.

The real key on performance is going to be whether or not these are queries, updates or inserts.

For queries, Mongo can probably handle this load. It's really going to be about data size to memory size ratios. If you have a server with 1GB of RAM and 150GB of data, then you're probably not going to get 250 queries / second (with any DB technology). But with reasonable hardware specs, Mongo can hit this speed on a single 64-bit server.

If you have 5,000 active users and you're constantly updating existing records then Mongo will be really fast (on par with updating memcached on a single machine). The reason here is simply that Mongo will likely keep the record in memory. So a user will send updates every 5 seconds and the in-memory object will be updated.

If you are constantly inserting new records, then the limitation is really going to be one of throughput. When you're writing lots of new data, you're also forcing the index to expand. So if you're planning to pump in Gigs of new data, then you risk saturating the disk throughput and you'll need to shard.

So based on your questions, it looks like you're mostly querying/updating. You'll be writing new records, but not 1000 new records / second. If this is the case, then MongoDB is probably right for you. It will definitely get around a lot of caching concerns.

Gates VP
Awesome, yeah you are right I will be mostly updating/obtaining existing data.
quiggle
Thanks for distinguishing between queries, updates and inserts
mikezter