views:

302

answers:

3

Hi!

I'm willing to give MongoDB and CouchDB a serious try. So far I've worked a bit with Mongo, but I'm also intrigued by Couch's RESTful approach.

Having worked for years with relational DBs, I still don't get what is the best way to get some things done with non relational databases.

For example, if I have 1000 car shops and 1000 car types, I want to specify what kind of cars each shop sells. Each car has 100 features. Within a relational database i'd make a middle table to link each car shop with the car types it sells via IDs. What is the approach of No-sql? If every car shop sells 50 car types, it means replicating a huge amount of data, if I have to store within the car shop all the features of all the car types it sells!

Any help appreciated.

A: 

Coming from a HBase/BigTable point of view, typically you would completely denormalize your data, and use a "list" field, or multidimensional map column (see this link for a better description).

The word "column" is another loaded word like "table" and "base" which carries the emotional baggage of years of RDBMS experience.

Instead, I find it easier to think about this like a multidimensional map - a map of maps if you will.

For your example for a many-to-many relationship, you can still create two tables, and use your multidimenstional map column to hold the relationship between the tables.

See the FAQ question 20 in the Hadoop/HBase FAQ:

Q:[Michael Dagaev] How would you design an Hbase table for many-to-many association between two entities, for example Student and Course?

I would define two tables: Student: student id student data (name, address, ...) courses (use course ids as column qualifiers here) Course: course id course data (name, syllabus, ...) students (use student ids as column qualifiers here) Does it make sense?

A[Jonathan Gray] : Your design does make sense. As you said, you'd probably have two column-families in each of the Student and Course tables. One for the data, another with a column per student or course. For example, a student row might look like: Student : id/row/key = 1001 data:name = Student Name data:address = 123 ABC St courses:2001 = (If you need more information about this association, for example, if they are on the waiting list) courses:2002 = ... This schema gives you fast access to the queries, show all classes for a student (student table, courses family), or all students for a class (courses table, students family).

GalacticJello
I think we are talking here about a many-to-many-to-many relationship not a many-to-many relationship. Each car type has lots of features and each shop can sell many car types.
Theo
+1  A: 

In MongoDB an often used approach would be store a list of _ids of car types in each car shop. So no separate join table but still basically doing a client-side join.

Embedded documents become more relevant for cases that aren't many-to-many like this.

mdirolf
+3  A: 

I can only speak to CouchDB.

The best way to stick your data in the db is to not normalize it at all beyond converting it to JSON. If that data is "cars" then stick all the data about every car in the database.

You then use map/reduce to create a normalized index of the data. So, if you want an index of every car, sorted first by shop, then by car-type you would emit each car with an index of [shop, car-type].

Map reduce seems a little scary at first, but you don't need to understand all the complicated stuff or even btrees, all you need to understand is how the key sorting works.

http://wiki.apache.org/couchdb/View_collation

With that alone you can create amazing normalized indexes over differing documents with the map reduce system in CouchDB.

mikeal