ansaurus

Question

Answer 1

+2 A:

MongoDB will let you store this data efficiently in Type 1. Depending on your use it will look like one these (data is in JSON):

Array of Strings

{ "_id" : 1, "strings" : ["a", "b", "c", "d", "e"] }

Set of KV Strings

{ "_id" : 1, "s1" : "a", "s2" : "b", "s3" : "c", "s4" : "d", "s5" : "e" }

Based on your queries, I would probably use the Array of Strings method. Here's why:

I might need to know what all strings a given ID contains and then intersect the list with another list obtained for a different ID.

This is easy, you get one Key Value look-up for the ID. In code, it would look something like this:

db.my_collection.find({ "_id" : 1});

I might need to know what all IDs contain a given string

Similarly easy:

db.my_collection.find({ "strings" : "my_string" })

Yeah it's that easy. I know that "strings" is technically an array, but MongoDB will recognize the item as an array and will loop through to find the value. Docs for this are here.

As a bonus, you can index the "strings" field and you will get an index on the array. So the find above will actually perform relatively fast (with the obvious trade-off that the index will be very large).

In terms of scaling a 14-node cluster may almost be overkill. However, Mongo does support auto-sharding and replication sets. They even work together, here's a blog post from a 10gen member to get you started (10gen makes Mongo).

Gates VP 2010-10-02 18:15:46

@Gates: Thanks for the detailed explanation.

Legend 2010-10-03 04:47:38

ansaurus

tags:

views:

answers:

MongoDB or CouchDB or something else?

related questions