views:

528

answers:

1

Where I work, we use Ruby on Rails to create both backend and frontend applications. Usually, these applications interact with the same MySQL database. It works great for a majority of our data, but we have one situation which I would like to move to a NoSQL environment.

We have clients, and our clients have what we call "inventories"--one or more of them. An inventory can have many thousands of items. This is currently done through two relational database tables, inventories and inventory_items.

The problems start when two different inventories have different parameters:

# Inventory item from inventory 1, televisions 
{
  inventory_id: 1
  sku: 12345
  name: Samsung LCD 40 inches
  model: 582903-4
  brand: Samsung
  screen_size: 40
  type: LCD
  price: 999.95
}

# Inventory item from inventory 2, accomodation
{
  inventory_id: 2
  sku: 48cab23fa
  name: New York Hilton
  accomodation_type: hotel
  star_rating: 5
  price_per_night: 395
}

Since we obviously can't use brand or star_rating as the column name in inventory_items, our solution so far has been to use generic column names such as text_a, text_b, float_a, int_a, etc, and introduce a third table, inventory_schemas. The tables now look like this:

# Inventory schema for inventory 1, televisions 
{
  inventory_id: 1
  int_a: sku
  text_a: name
  text_b: model
  text_c: brand
  int_b: screen_size
  text_d: type
  float_a: price
}

# Inventory item from inventory 1, televisions 
{
  inventory_id: 1
  int_a: 12345
  text_a: Samsung LCD 40 inches
  text_b: 582903-4
  text_c: Samsung
  int_a: 40
  text_d: LCD
  float_a: 999.95
}

This has worked well... up to a point. It's clunky, it's unintuitive and it lacks scalability. We have to devote resources to set up inventory schemas. Using separate tables is not an option.

Enter NoSQL. With it, we could let each and every item have their own parameters and still store them together. From the research I've done, it certainly seems like a great alterative for this situation.

Specifically, I've looked at CouchDB and MongoDB. Both look great. However, there are a few other bits and pieces we need to be able to do with our inventory:

  • We need to be able to select items from only one (or several) inventories.
  • We need to be able to filter items based on its parameters (eg. get all items from inventory 2 where type is 'hotel').
  • We need to be able to group items based on parameters (eg. get the lowest price from items in inventory 1 where brand is 'Samsung').
  • We need to (potentially) be able to retrieve thousands of items at a time.
  • We need to be able to access the data from multiple applications; both backend (to process data) and frontend (to display data).
  • Rapid bulk insertion is desired, though not required.

Based on the structure, and the requirements, are either CouchDB or MongoDB suitable for us? If so, which one will be the best fit?

Thanks for reading, and thanks in advance for answers.

EDIT: One of the reasons I like CouchDB is that it would be possible for us in the frontend application to request data via JavaScript directly from the server after page load, and display the results without having to use any backend code whatsoever. This would lead to better page load and less server strain, as the fetching/processing of the data would be done client-side.

+7  A: 

I work on MongoDB, so you should take this with a grain of salt, but this looks like a great fit for Mongo.

  • We need to be able to select items from only one (or several) inventories.

It's easy to ad hoc queries on any fields.

  • We need to be able to filter items based on its parameters (eg. get all items from inventory 2 where type is 'hotel').

The query for this would be: {"inventory_id" : 2, "type" : "hotel"}.

  • We need to be able to group items based on parameters (eg. get the lowest price from items in inventory 1 where brand is 'Samsung').

Again, super easy: db.items.find({"brand" : "Samsung"}).sort({"price" : 1})

  • We need to (potentially) be able to retrieve thousands of items at a time.

No problem.

  • Rapid bulk insertion is desired, though not required.

MongoDB has much faster bulk inserts than CouchDB.

Also, there's a REST interface for MongoDB: http://github.com/kchodorow/sleepy.mongoose

You might want to read http://chemeo.com/doc/technology, who dealt with the arbitrary property search problem with MongoDB.

kristina
Thanks for your answer! One follow-up question: On the grouping, what if I want to find out what the lowest price is for Samsung as well as Sony in one query? What if there are 100 or 1000 brands? In SQL I can use `SELECT MIN(price) FROM table GROUP BY brand;`--is something similar possible for MongoDB?
vonconrad
Yes, Mongo has a group function that's pretty much equivalent to GROUP BY, see http://www.mongodb.org/display/DOCS/Aggregation
kristina