views:

565

answers:

4

Hi everyone,

Recently I'm exploring NoSQL Databases. I need an advice about how to store data in the most optimal and efficient way for a given problem. I'm targeting MongoDB, now. However it should be the same with CouchDB.

Let's say we have these 3 Models:

Story:
 id
 title

User:
 id
 name

Vote:
  id
  story_id
  user_id

I want to be able to ask the database these questions:

  • Who has voted for this Story?
  • What this User has Voted for?

I'm doing simple joins while working with a relational DB. The question is, how should I store the data for those objects in order to be most efficient.

For example, if I store the Vote objects as a subcollection of Stories it wont be easy to get the info - "What a user has voted for".

Well, I'll be glad to give me an advice.

Thank you

+1  A: 

Ok, you haven given a normalized data model as you would do in an SQL setup.

In my understanding you don't do this in MongoDB. You could store references, but you do not for performance reasons in the general case.

I'm not an expert in the NoSQL area in no way, but why don't you simply follow your needs and store the user (ids) that have voted for a story in the stories collection and the story (ids) a user has voted for in the users collection?

mkluwe
+2  A: 

I would suggest storing votes as a list of story _ids in each user. That way you can find out what stories a user has voted for just by looking at the list. To get the users who have voted for a story you can do something like:

db.users.find({stories: story_id})

where story_id is the _id of the story in question. If you create an index on the stories field both of those queries will be fast.

mdirolf
Well, In fact I want to store more info in a Vote model. For example: created_at, ip, user_agent.Should I store the data in the stories list of users collection?
Stanislav
You could store the votes as an array of sub-documents, each like `{story_id: ..., created_at: ..., ip: ...}`, etc. Then the query becomes `find({'stories.story_id': ...})`. You can index on that, too.
mdirolf
Well I have a fairly big database with a few M records and will test the above scenario.
Stanislav
+1  A: 

In CouchDB this is very simple. One view emits:

function(doc) {
 if(doc.type == "vote") {
   emit(doc.story_id, doc.user_id);
 }
}

Another view emits:

function(doc) {
 if(doc.type == "vote") {
   emit(doc.user_id, doc.story_id);
 }
}

Both are queries extremely fast since there is no join. If you do need user data or story data, CouchDB supports multi-document fetch. Also quite fast and is one way to do a "join".

dnolen
I'll need to queries in this scenario, will I?One for querying an index for Votes documents and one for getting the documents for User/Story.
Stanislav
@Stanislav. That is correct. You'll first need to fetch the votes and then fetch users and/or stories for those votes.
dnolen
+1  A: 
  • don't worry if your queries are efficient until it starts to matter
  • according to below quote, you're doing it wrong

The way I have been going about the mind switch is to forget about the database alltogether. In the relational db world you always have to worry about data normalization and your table structure. Ditch it all. Just layout your web page. Lay them all out. Now look at them. Your already 2/3 there. If you forget the notion that database size matters and data shouldn't be duplicated than your 3/4 there and you didnt even have to write any code! Let your views dictate your Models. You don't have to take your objects and make them 2 dimensional anymore as in the relational world. You can store objects with shape now.

how-to-think-in-data-stores-instead-of-databases

Dustin Getz