views:

119

answers:

2

I'm building a simple accounting system where a user has many bills. Now I'm trying to decide if bills should be its own collection, or nested within the user. I'm leaning towards the former but I've NEVER done any noSQL stuff so I'm just going by trial and error and what I think makes sense to me.

I understand that Mongo has a 4mb document size limit which is what's making me think that I should have a separate collection for bills, as these will accumulate daily and could eventually take up a large amount of space.

I'm just looking for opinions on the matter. Basically I'll be querying for bills of a user between different date periods (as you can imagine an accounting system would do).

Not that it really matters but I'm using Mongoid in a Rails3 project. I figured I'd do something like:

class User
  references_many :bills
end

class Bill
  referenced_in :user
end

Any comments or design suggestions are greatly appreciated.

+1  A: 

One question you might want to consider is will there ever be a time where you'll need to reference the bills individually apart from their membership in a user? If so, it'll be simpler if they have an independent existence.

Apart from that, the size limit issue you've already identified is a good reason to split them off.

There might be a transactional issue as well, if you're writing a large user with many included bills, what happens if you get reasonably simultaneous writes of changes to the same user from different connections? I don't know enough about mongo to know how it would resolve this - my guess would be that if the writes contained different added bills you'd get them both, but if they contained different changes in existing bills you'd get overwrites - Hopefully someone else will comment on this, but at the very least I'd test it. If you're writing the bills to a separate collection this isn't a concern.

Steve B.
+4  A: 

1) Regarding the 4MB document limit, this is what the "MongoDB: The Definitive Guide" says :

Documents larger than 4MB (when converted to BSON) cannot be saved to the database. This is a somewhat arbitrary limit (and may be raised in the future); it is mostly to prevent bad schema design and ensure consistent performance. To see the BSON size (in bytes) of the document doc, run Object.bsonsize(doc) from the shell.

To give you an idea of how much 4MB is, the entire text of War and Peace is just 3.14MB.

In the end it depends on how big you expect the bills for a user to grow. I hope the excerpt above gives you an idea of the limits imposed by the document size.

2) De-normalized schema (bills go with the user document) is the way to go if you know that you are never going to run global queries on bills (example of such a query is if you want to retrieve the ten most recent bills entered into the system). You will have to use map-reduce to retrieve results for such queries if you use a denormalized schema.

Normalized schema (user and bills in separate documents) is a better choice if you want flexibility in how the bills are queried. However, since MongoDB doesn't support joins, you will have to run multiple queries every time you want to retrieve the bills corresponding to a user.

Given the use-case you mentioned, I'd go with de-normalized schema.

3) All updates in MongoDB are atomic and serialized. That should answer Steve's concern.

You may find these slides helpful. http://www.slideshare.net/kbanker/mongodb-meetup

You may also look at MongoDB's Production Deployments page. You may find the SF.net slides helpful.

srivani
ah it's only on writing... so does this affect atomic opts on the embedded docs? For instance, if I'm just doing a $push to my bills on my user doc, does it matter if my user and all of its bills amount to 4mb, or is it only if the bill itself happens to be 4mb on write. I have a feeling it's the latter and thus I'm safe (as there's no possible way a single bill could contain 4mb of data, or that i'd be writing enough bills in 1 go to reach that amount) Does that sound right? Assuming that, I think I'll take your suggestion and go de-normalized.
brad
Hmm... I think I was wrong actually, I'm pretty sure the 4mb limit would affect the user if their bills exceeded that amount, however the amount of data in a bill is fairly small so I'm going to give it a shot with embedded bills and do some testing in the future to see what kind of bill capacity i can handle
brad

related questions