tags:

views:

73

answers:

2

What kind of projects benefit from using a NoSQL database instead of rdbms wrapped by an ORM?

Examples:

  • Stackoverflow similiar sites?
  • Social communities?
  • forums?
A: 

NoSQL in sense of different design approach not only query language can have different features. Ex. column oriented db's are used for large amount of data warehouses.
Might me used for OLAP.

Simmilar to my qustion, you'll find there a lot of resources.

bua
+8  A: 

Your question is very general. NoSQL describes a collection of database techniques that are very different from each other. Roughly, there are:

  • Key-value stores (Redis, Riak)
  • Triplestores (AllegroGraph)
  • Column-family stores (Bigtable, Cassandra)
  • Document-oriented stores (CouchDB, MongoDB)
  • Graph databases (Neo4j)

A project can benefit from the use of a document database during the development phase of the project, because you won't have to design complex entity-relation diagrams or write complex join queries. I've detailed other uses of document databases in this answer.

If your application needs to handle very large amounts of data, the development phase will likely be longer when you use a specialized NoSQL solution such as Cassandra. However, when your application goes into production, it will greatly benefit from the performance and scalability of Cassandra.

Very generally speaking, if an application has the following requirements:

  • scale horizontally
  • work with data model X
  • perform Y operations

the application will benefit from using a NoSQL solution that is geared towards storing data model X and perform Y operations on the data. If you need more specific answers regarding a certain type of NoSQL database, you'll need to update your question.

  1. Benefits during development (e.g. easier to use than SQL, no licensing costs)?
  2. Benefits in terms of performance (e.g. runs like hell with a million concurrent users)?
  3. What type of NoSQL database?

Update

Key-value stores can only be queried by key in most cases. They're useful to store simple data, such as user sessions, simple profile data or precomputed values and output. Although it is possible to store more complex data in key-value pairs, it burdens the application with the responsibility of maintaining 'manual' indexes in order to perform more advanced queries.

Triplestores are for storing Resource Description Metadata. I don't anything about these stores, except for what Wikipedia tells me, so you'll have to do some research on that.

Column-family stores are built for storing and processing very large amounts of data. They are used by Google's search engine and Facebook's inbox search. The data is queried by MapReduce functions. Although MapReduce functions may be hard to grasp in the beginning, the concept is quite simple. Here's an analogy which (hopefully) explains the concept:

Imagine you have multiple shoe-boxes filled with receipts, and you want to calculate your total expenses. You invite some of your friends over and assign a person to each shoe-box. Each person writes down the total of each receipt in his shoe-box. This process of selecting the required data is the Map part.

When a person has written down the totals of (some of) his receipts, he can sum up these totals. This is the Reduce part and can be repeated multiple times until all receipts have been handled. In the end, all of your friends come together and sum up their total sums, giving you your total expenses. That's the final Reduce step.

The advantage of this approach is that you can have any number of shoe-boxes and you can assign any number of people to a shoe-box and still end up with the same result. Each shoe-box can be seen as a server in the database's network. Each friend can be seem as a thread on the server. With MapReduce you can have your data distributed across many servers and have each server handle part of the query, optimizing the performance of your database.

Document-oriented stores are explained in this question, so I won't discuss them here.

Graph databases are for storing networks of highly connected objects, like the users on a social network for example. These databases are optimized for graph operations, such as finding the shortest path between two nodes, or finding all nodes within three hops from the current node. Such operations are quite expensive on RDBMS systems or other NoSQL databases, but very cheap on graph databases.

Niels van der Rest
+1 Good answer. I'll upvote you when I have more votes.
NullUserException
Can you give a brief explanation of what each type is intended for? Key/value is quite obvious, but the others?
jgauffin
@jgauffin: I've added a 'short' description of each type to my answer :)
Niels van der Rest
woaw. very nice. thanks.
jgauffin
It's very ironic though that most large social networking site does not use Graph databases but instead uses key-value store database like Cassandra or Voldemort.
jpartogi
@jpartogi: That's mainly because graph databases don't scale as well as other NoSQL solutions. This is due to the high connectivity between objects, which makes it practically impossible to store all related data on a single server for better performance. I believe Twitter still uses [FlockDB](http://github.com/twitter/flockdb#readme). It's a lightweight graph database that favors performance over complex graph operations.
Niels van der Rest

related questions