views:

51

answers:

1

I'm trying to figure out the proper way to think about this problem for a document based data storage system. I've got the simple case of a two-tier category system, where there are Industries and Industry Groups (think Plumbing and Home Services).

My first thought was the document would be Industry Group and it would have Industries inside it, but the issue is that most of the related data will be to an Industry. I'm not sure if it's 'kosher' to have data relating into sub items within a document. For instance, an article might be assigned to an industry, not to a group - so how does that reference look (assuming the link was from a non-nested document)?

Anyway, some general insight into the right way of thinking about this would be great.

A: 

The best way to design any non-relational database is based on the queries you need to run against the data, not the data itself.

You can design a non-relational database any way you want, because there are no rules of normalization as there are with relational databases.


Re your comment:

You need to enumerate all the ways the data will be queried. Then decide what document structure would make the query most convenient in each case.

From this, some patterns may emerge. Then use your judgment to decide which document structure to pick, that satisfies the greatest cross-section of queries.

Also keep in mind that in MongoDB, storing data redundantly is just fine. Recommended, in fact, because it's unlikely that one single document structure will be right for all your queries. You might find one document structure is pretty good for most of your queries, but makes the last query impossible. That's when you should create a secondary, redundant document collection to serve that last query, because all the other queries are taken care of.

There are no rules for structuring non-relational databases. This makes them harder than relational databases. Sorry, but NoSQL is a case of TANSTAAFL!

Bill Karwin
I'm not sure how to translate that into something useful for me here. This data is going to be queried lots of different ways - for a pair of drop downs when one is selected, from Industry to parent when it's already attached to something, across related objects for counts, etc. So that leaves me with my same question.
Jim