views:

134

answers:

5

Since we can structure a MongoDB any way we want, we can do it this way

{ products:
  [
    { date: "2010-09-08", data: { pageviews: 23, timeOnPage: 178 }},
    { date: "2010-09-09", data: { pageviews: 36, timeOnPage: 202 }}
  ],
  brands:
  [
    { date: "2010-09-08", data: { pageviews: 123, timeOnPage: 210 }},
    { date: "2010-09-09", data: { pageviews: 61, timeOnPage: 876 }}
  ]
}

so as we add data to it day after day, the products document and brands document will become bigger and bigger. After 3 years, there will be a thousand elements in products and in brands. Is it not good for MongoDB? Should we break it down more into 4 documents:

{ type: 'products', date: "2010-09-08", data: { pageviews: 23, timeOnPage: 178 }}
{ type: 'products', date: "2010-09-09", data: { pageviews: 36, timeOnPage: 202 }}
{ type: 'brands', date: "2010-09-08", data: { pageviews: 123, timeOnPage: 210 }}
{ type: 'brands', date: "2010-09-08", data: { pageviews: 61, timeOnPage: 876 }}

So that after 3 years, there will be just 2000 "documents"?

+1  A: 

I'm not a MongoDB expert, but 1000 isn't "huge". Also I would seriously doubt any difference between 1 top-level document containing 4000 total subelements, and 4 top-level documents each containing 1000 subelements -- one of those six-of-one vs. half-dozen-of-another issues.

Now if you were talking 1 document with 1,000,000 elements vs. 1000 documents each with 1000 elements, that's a different order of magnitude + there might be advantages of one vs. the other, either/both in storage time or query time.

Jason S
+1  A: 

Assuming you're using Mongoid (you tagged it), you wouldn't want to use your first schema idea. It would be very inefficient for Mongoid to pull out those huge documents each time you wanted to look up a single little value.

What would probably be a much better model for you is:

class Log
  include Mongoid::Document

  field :type
  field :date
  field :pageviews,    :type => Integer
  field :time_on_page, :type => Integer
end

This would give you documents that look like:

{_id: ..., date: '2010-09-08', type: 'products', pageviews: 23, time_on_page: 178}

Don't worry about the number of documents - Mongo can handle billions of these. And you can index on type and date to easily find whatever figures you want.

Furthermore, this way it's a lot easier to update the records through the driver, without even pulling the record from the database. For example, on each pageview you could do something like:

Log.collection.update({'type' => 'products', 'date' => '2010-09-08'}, {'$inc' => {'pageview' => 1}})
PreciousBodilyFluids
A: 

You have talked about how you are going to update the data, but how do you plan to query it? It probably makes a difference on how you should structure your docs.

The problem with using embedded elements in arrays is that each time you add to that it may not fit in the current space allocated for the document. This will cause the (new) document to be reallocated and moved (that move will require re-writing any of the indexes for the doc).

I would generally suggest the second form you suggested, but it depends on the questions above.

Note: 4MB is an arbitrary limit and will be raised soon; you can recompile the server for any limit you want in fact.

Scott Hernandez
A: 

Hi,

It seems your design closely resembles the relational table schema.

alt text

So every document added will be a separate entry in a collection having its own identifier. Though mongo document size is limited to 4 MB, its mostly enough to accommodate plain text documents. And you don't have to worry about number of growing documents in mongo, thats the essence of document based databases.

Only thing you need to worry about is size of the db collection. Its limited to 2GB for 32 bit systems. Because MongoDB uses memory-mapped files, as they're tied to the available memory addressing. This is not a problem with 64 bit systems.

Hope this helps

Cheers

Ramesh Vel
A: 

Again this depends on your use case of querying. If you really care about single item, such as products per day:

{ type: 'products', date: "2010-09-08", data: { pageviews: 23, timeOnPage: 178 }}

then you could include multiple days in one date.

{ type: 'products', { date: "2010-09-08", data: { pageviews: 23, timeOnPage: 178 } } }

We use something like this:

{ type: 'products', "2010" : { "09" : { "08" : data: { pageviews: 23, timeOnPage: 178 }} } } }

So we can increment by day: { "$inc" : { "2010.09.08.data.pageviews" : 1 } }

Maybe seems complicated, but the advantage is you can store all data about a 'type' in 1 record. So you can retrieve a single record and have all information.

Amala