views:

403

answers:

5

Target application is a medium-sized website built to support several hundred to several thousand users an hour, with an option to scale above that. Data model is rather simple, and caching potential is pretty high (~10:1 ratio of read to edit actions).

What should be the considerations when coming to choose between a relational, SQL-based datastore to a NoSQL option (such as HBase and Cassandra)?

+6  A: 

http://carsonified.com/blog/dev/should-you-go-beyond-relational-databases/

Provides a nice breakdown of the things to consider when looking at certain types of data storage tools.

Nick Campbell
A: 

Digg have some interesting articles on this question. Essentially, you're shifting the burden of processing to writes rather than reads, which may be desirable in highly scalable applications. Cassandra specifically is also highly available.

Simplistically, Cassandra is a distributed database with a BigTable data model running on a Dynamo like infrastructure. It is column-oriented and allows for the storage of relatively structured data. It has a fully decentralized model; every node is identical and there is no single point of failure. It's also extremely fault tolerant; data is replicated to multiple nodes and across data centers. Cassandra is also very elastic; read and write throughput increase linearly as new machines are added.

Andy
A: 

When you say, data modell is rather simple, this could speak for the NoSQL option.

When you have plenty of attributes to make selections, heavy transaction load or complicated table structures, that would speak for traditional SQL tables.

I would recommend to find out how difficult it would be to implement the data modell with one or two NoSQL databases. When this is rather difficult, you could also make a classical table schema to compare with.

When you have difficulties with NoSQL, this could speak for the SQL option. But also it could be, that the heavy load is better handled with NoSQL -- but also it could be that a good SQL database scales sufficiently ...

Buffering can also be done with a simple Proxy-Server ...

On difficulties, a mix of NoSQL and SQL could be also considered.

Juergen
+7  A: 

To me, you don't have any particular problem to solve. If you need ACIDity, use a database; if you don't, then it doesn't matter. At the end just build your app. And let me quote NoSQL: If Only It Was That Easy:

The real thing to point out is that if you are being held back from making something super awesome because you can’t choose a database, you are doing it wrong. If you know mysql, just used it. Optimize when you actually need to. Use it like a k/v store, use it like a rdbms, but for god sake, build your killer app! None of this will matter to most apps. Facebook still uses MySQL, a lot. Wikipedia uses MySQL, a lot. FriendFeed uses MySQL, a lot. NoSQL is a great tool, but it’s certainly not going to be your competitive edge, it’s not going to make your app hot, and most of all, your users won’t give a shit about any of this.

Pascal Thivent
+1  A: 

I liked Ian Eure's rule of thumb: “if you’re deploying memcache on top of your database, you’re inventing your own ad-hoc, difficult to maintain NoSQL system.”

http://www.rackspacecloud.com/blog/2010/02/25/should-you-switch-to-nosql-too/

jbellis