ansaurus

Question

Answer 1

+3 A:

Every document has a size limit of 4MB (which in text is A LOT).

It's recommended to run MongoDB in replication mode or to use sharding as you otherwise will have problems with single-server durability. Single-server durability is not given because MongoDB only fsync's to the disk every 60 seconds, so if your server goes down between two fsync's the data that got inserted/updated in that time will be lost.

There is no limit of documents other than your disk space in mongodb.

You should try to import a dataset that matches your data (or generate some test data) to MongoDB and analyse how fast your query executes. Remember to set indexes on those fields that you use heavily in your queries. Your above query should work pretty well even with a lot of data.

In order to analyze the speed of your query use the database profiler MongoDB comes with. On the mongo shell do:

db.setProfilingLevel(2); // to set the profiling level
[your query]
db.system.profile.find(); // to see the results

Remember to turn off profiling once you're finished (log will get pretty huge otherwise).

Regarding your database layout I suggest to change the "schema" (yeah yeah, schema less..) to:

website (collection): - some keys/values about the particular document

statistics (collection) - millions of rows where each record is inserted from a pageview (key/value array containing data such as timestamp, ip, browser, etc) + DBRef to website

See Database References

halfdan 2010-09-29 09:10:12

this is great, thanks! If I use the collection for statistics, is there still a 4MB limit? I'm sure it may be possible to use that group command for example on multiple collections however for simplicity sake I'd rather have all raw records stored inside one "table".

Joe 2010-09-29 09:26:31

The 4MB limit is per document, the collection itself can contain as many documents as your disk can hold. Your statistics will grow rapidly and if they are stored inside a document you'll probably reach the 4MB limit very soon.

halfdan 2010-09-29 09:36:39

"the collection itself can contain as many documents as your disk can hold." With sharding, you can even go beyond that :-)

Thilo 2010-09-29 10:47:57

thanks a lot! going to try this out.

Joe 2010-09-29 11:02:03

"eventual consistency" doesn't mean that, mongo is not eventually consistent like cassandra or simpledb, it's strongly consistent like a rdbms. there is no transaction log in mongo, so it can loose data on a power failture if there is no replication, this is called loosing data as is. "eventual consistency" means, you may get the old value of a record after update from some nodes in some conditions for a short time period.

sirmak 2010-09-29 19:18:56

@sirmak: +1. Good catch. What halfdan is talking about is called "single-server durability", which is a target for the next release.

Thilo 2010-09-29 23:30:10

@sirmak: You're right! Thanks for the catch, gonna update the post.

halfdan 2010-09-30 09:56:50

Answer 2

+2 A:

Documents in MongoDB are limited to a size of 4MB. Let's say a single page view results in 32 bytes being stored. Then you'll be able to store about 130,000 page views in a single document.

Basically the amount of page views a page can generate is infinite, and you indicated that you expect millions of them, so I suggest you store the log entries as separate documents. Each log entry should contain the _id of the parent document.

The number of documents in a database is limited to 2GB of total space on 32-bit systems. 64-bit systems don't have this limitation.

The group() function is a map-reduce query under the hood. The documentation recommends you use a map-reduce query instead of group(), because it has some limitations with large datasets and sharded environments.

Niels van der Rest 2010-09-29 09:39:16

+1 and map reduce have some hard limitations

sirmak 2010-09-29 19:15:48

ansaurus

tags:

views:

answers:

general questions about using mongodb

related questions