views:

1292

answers:

7

There has been a lot of talk related to Cassandra lately.

Twitter, Digg, Facebook, etc all use it.

When does it make sense to:

  • use Cassandra,
  • not use Cassandra, and
  • use a RDMS instead of Cassandra.
A: 

less data! easy architecture!

Thomas
A: 

My understanding is that you would use NoSQL when you just have a single key-value pair. Meaning, your RDMS table would just be 2 columns (key, value).

JustinT
Well Column Families can have multiple columns in Cassandra, correct?
Luke
Yes, and it also allows to have SuperColumns
Schildmeijer
Plus this is just cassandra. If you're talking about NoSQL, well hell, look at MongoDB's data model, arrays and hashes nested as far as you can fit in a 4MB row. Indexable as well.
Michael
+7  A: 

The general idea of NoSQL is that you should use whichever data store is the best fit for your application. If you have a table of financial data, use SQL. If you have objects that would require complex/slow queries to map to a relational schema, use an object or key/value store.

Of course just about any real world problem you run into is somewhere in between those two extremes and neither solution will be perfect. You need to consider the capabilities of each store and the consequences of using one over the other, which will be very much specific to the problem you are trying to solve.

Tom Clarkson
What is the advantage of sql when using fininacial data?
Paco
The schema is unlikely to change, it fits well in a table structure, and lost/inconsistent data could cause real problems.
Tom Clarkson
I don't understand why inconsistent data can cause real problems with banks. Scenario:You have one bank account, with $100 on above the limit on it, and two bank cards. When you try to withdraw money with the two cards at the same time at 2 different ATMs, you will get 2 times $100, and a letter with an extra fee in your mail box. The bank earns money (the extra fee for being below the limit) by using inconsistent data. It's to hard to connect all ATMs in the world with each other through one large relational database. Can you give an example where inconsistent financial data can be a problem?
Paco
That stuff is all COBOL and batch processing, and not nearly as well designed/stable as you might think. ATMs do not connect to any sort of unified data store, so are hardly a suitable example. It's like saying SQL isn't suitable for web apps because you can't give everyone on the internet direct access to your database.Besides, I never said anything about banks - think things like orders on an ecommerce site where you don't have to deal with an organization so conservative that SQL is considered new and untrusted.
Tom Clarkson
So the only reason is conservatism, no technical reason?
Paco
You seem to be missing the point. Technically anything is possible, using any set of tools, but that doesn't make it a good idea. For tracking sales, the benefits of sql outweigh the disadvantages. If you think you can set up a banking system using new technology, good luck to you.
Tom Clarkson
@Paco: The first ATM reads your balance($100), and the second ATM does the same. Both ATMs deduct $100 from $100 and write the final balance of $0 back to your account. Result: the bank loses $100.
Seun Osewa
@Seun Osewa: That would be a stupid bank. A normal bank would ask you to pay back $100 and a ridiculous interest rate for being below the limit and earn some money instead of losing money.
Paco
@Tom Clarkson: When you cannot name a benefit, there is no benefit.
Paco
@Paco: The point is, without proper transaction isolation, the normal bank won't even know the account has been overdrawn. They won't even know.
Seun Osewa
@Seun Osewa: A bank does not use atomic transactions for withdrawing money from an ATM. It would cost to much hardware to connect all ATMs in the world to the same database with atomic transactions.
Paco
+4  A: 

When evaluating distributed data systems, you have to consider the CAP theorem - you can pick two of the following: consistency, availability, and partition tolerance.

Cassandra is an available, partition-tolerant system that supports eventual consistency. For more information see my Visual Guide to NoSQL Systems.

Nathan Hurst
+2  A: 

Cassandra is the answer to a particular problem: What do you do when you have so much data that it does not fit on one server ? How do you store all your data on many servers and do not break your bank account and not make your developers insane ? Facebook gets 4 Terabyte of new compressed data EVERY DAY. And this number most likely will grow more than twice within a year.

If you do not have this much data or if you have millions to pay for Enterprise Oracle/DB2 cluster installation and specialists required to set it up and maintain it, then you are fine with SQL database.

Vagif Verdi
+1  A: 

another situation that makes the choice easier is when you want to use aggregate function like sum, min, max, etcetera and complex queries (like in the financial system mentioned above) then a relational database is probably more convenient then a nosql database since both are not possible on a nosql databse unless you use really a lot of Inverted indexes. When you do use nosql you would have to do the aggregate functions in code or store them seperatly in its own columnfamily but this makes it all quite complex and reduces the performance that you gained by using nosql.

ronaldmathies
A: 

Talking with someone in the midst of deploying Cassandra, it doesn't handle the many-to-many well. They are doing a hack job to do their initial testing. I spoke with a Cassandra consultant about this and he said he wouldn't recommend it if you had this problem set.

Warren