views:

373

answers:

9

Well, NoSQL is a buzzword right now so I've been looking into it. I'm yet to get my head around ColumnFamilies and SuperColumns, etc... But I have been looking at how the data is mapped.

After reading this article, and others, it seems the data is mapped in a JSON like format.

Users = {
    1: {
        username: "dave",
        password: "blahblah",
        dateReged: "1/1/1"
    },
    2: {
        username: "etc",
        password: "blahblah",
        dateReged: "2/1/1",
        comment: "this guy has a comment and dave doesns't"
    },
}

The RDBMS format would be:

Table name: "Users"

id | username | password | dateReged | comment
---+----------+----------+-----------+--------
 1 |  dave    | blahblah |  1/1/1    |
---+----------+----------+-----------+--------
 2 |  etc     | blahblah |  2/1/1    | this guy has a comment and dave doesn't

Assuming I'm understand this correctly and my above examples are right, why would I choose the RDBMS design over the NoSQL design? Personally, I'd much rather work with the JSON structure... Does this mean I should choose NoSQL over, say, MySQL?

I guess what I'm asking is "when should I choose NoSQL over RDBMS?"

On a side note, as I've said, I'm still not fully understanding how to go about implementing a Cassandra database. Ie, how do I create the above Users table in a new database? Any tutorials, documentation, etc you could point to would be great. My google'ing hasn't turned up much in terms of 'starting from scratch'...

+1  A: 

I guess what I'm asking is "when should I choose NoSQL over RDBMS?"

[Caveat: I've never read about NoSQL before]

According to Wikipedia, NoSQL isn't good at joins: which implies (to me) no referential integrity and no normalization.

ChrisW
To be honest, my knowledge of SQL is pretty poor. I think I've used the JOIN keyword once. Only once. Losing that wouldn't really affect me.
dave
"Losing that wouldn't really affect me". Famous last words...
Thilo
@dave: If you don't understand SQL (or, more importantly, its foundations in relational algebra) then obviously the SQL and NoSQL solutions will seem very similar. The differences don't really begin to manifest until you have a *lot* of data (and/or a *lot* of transactions).
Daniel Pryden
@dave Joins are associated with [database normalization](http://en.wikipedia.org/wiki/Database_normalization): the pictures in the right-hand margin of that article are a quick introduction.
ChrisW
+2  A: 

The advantage fo NoSql is that its simpler and if you have your OO blinkers on it fullfills all your persistence needs.

The advantage of SQL based realtional database is that you can easily re-use and extend your data in ways that were not envisaged in the original design. Also "Object" databases tend to perform very badly (even if its possable) when you want to do the equivalent of SQLs aggregate queries like COUNT, SUM, AVG.

Googles BIGTABLE which is the biggest OO database anywhere (and probably the biggest database period) also supports SQL and sql features like indexing and strong typing.

James Anderson
+1  A: 

The simplest answer I can think of is: When your data doesn't fit a relational model.

T3hc13h
I have seen several things which do not fit into an OO model, never yet seen anything that could not be modeled in a relational DB.
James Anderson
@James Anderson [Hierarchies (trees)](http://stackoverflow.com/questions/1085287) can be modeled in a relational DB, but it's a bit difficult/special to do that.
ChrisW
You certainly CAN model just about anything in a relational db, but in many cases you really have to contort your data.
Nils Weinander
+5  A: 

The main advantage of NoSQL is horizontal scalability and distributed storage. That means you can have a large number of 'cluster nodes' and write to them in parallel. The cluster will ensure changes are propagated to the other cluster nodes eventually (eventual consistency).

NoSQL is not so much about SQL (the term means "not only SQL"). In fact, some NoSQL products do support a subset of SQL. The reason the data format is different (JSON or list of property / value pairs versus tabular data) is: within relational databases, the number of columns (and column names) is defined in a central place, which doesn't work well with horizontal scalability (you would need to stop all cluster nodes for schema changes). Also, joins are not supported as much because that would break horizontal scalability (data from multiple cluster nodes may need to be read, if the data is distributed).

Thomas Mueller
And Oracle, DB2 , SqlServer, Teradata etc. dont support clustering ?? Well not before 1992 anyway.
James Anderson
They do support clustering, but they don't support horizontal scalability as well, because they try to support all the ACID properties. The NoSQL products don't try to support all ACID features. Some say NoSQL really means NoACID: http://dbmsmusings.blogspot.com/2010/08/problems-with-acid-and-how-to-fix-them.html
Thomas Mueller
+1  A: 

RDBMS' are all about consistency. They do a great job on data that gets churned alot with transactions. See also ACID (atomicity, consistency, isolation, durability). Sometimes you don't need all that, like when storing data from logs or working on data that's not going to change, just accumulate.

NoSQL databases let you relax the requirements for transactions and get better performance (as well as scale to large distributed storage silos easier).

woolstar
+2  A: 

If you are google, then you might be in a position where a NoSQL would be easier on you than a RDBMS. Since you are not, the many advantages an RDBMS provides you will probably be of some use. Significantly, on a single node, NoSQL offers absolutely no advantages over RDBMSes. RDBMSes offer lots of advantages over NoSQL, though. what are they?

RDBMSes use some pretty deep magic to understand the data it owns, and the data you are asking for, in such a way that it can return that data in the most efficient manner possible. If you didn't ask about some column, the rdbms doesn't waste any effort retrieving it. If you are interested in rows that have fields in common across two tables, (this is a join, btw), the RDBMS doesn't have to check every single pair of rows for matches, or what a NoSQL db usually does is just give you everything and make you do the checking. with a RDBMS, you can usually construct queries that are actually 'about' the data you are using, like "if the date is a tuesday", and if your indexes support it (if you do that query alot then you would add such an index) you can get those rows efficiently.

There is another reason why RDBMSes are nice. Transactions are easy on RDBMSes, but are much harder to get right on NoSQL databases. Supposing you are implementing a blogging engine. Suppose the post title (which appears in the URL) needs to be unique across all posts. In an RDBMS, you can easily be sure that you won't get this wrong accidentally. With a NoSQL database, if it does support some kind of transactional integrity, it's usually at the shard level, anything that could possibly require that kind of integrity must be on the same shard. since any pair of users could possibly be posting at the same moment, then every users' post must be on the same shard to get the same effect. Well, then you don't get any benefit at all from NoSQL.

TokenMacGuy
'Significantly, on a single node, NoSQL offers absolutely no advantages over RDBMSes. RDBMSes offer lots of advantages over NoSQL, though. what are they?' - erm No. One example: write times to MongoDB are significantly faster than write times to MS SQL server. Its a bit misleading to stipulate there are NO advantages. It may not be the right fit for the purpose, but if you're after speed, there is an advantage there.
Michael Shimmins
MongoDB is schemaless, this is also a big difference on a single node.
TTT
Yes, schemaless is different. The question is really about why would this be a good thing? I'm a bit suspicious of a schemaless setup. In theory, it makes changes easier. At the level of the database, it certainly does, you don't have to go to any length to add or remove properties at that level. On the other hand, it doesn't in any way make the semantic consequences of database migration any easier. What is the correct behavior when processing the fields that may be null? schemaless doesn't alleviate that in the slightest.
TokenMacGuy
+1  A: 

A little history lesson might help.

In early 90s the OODB was the buzzword -- much much louder than NoSQL today. The feeling was that they are taking over the database world. It goes without saying, that there is little theory supporting OODBs, so the claim was mostly about superior performance.

In early 00s XMLDB was the man. Again, there was little foundation, rather than incoherent ramblings about tree structures fitting poorly into relational model.

Today, try asking object or xml query on this forum and see how many answers would you get.

Tegiri Nenashi
A: 

NoSQl databases are fine for some websites where you don't need transaction or consistency where all you are doing is presenting some data (but until you get really really large, they are not really very needed).

But if you need to enforce financial rules (or other complex data integrity rules) or internal controls or reporting and aggregating data for reporting, you need an RDBMS. I'll bet even Google uses RDBMS' for their own HR and financial data, etc.

For some web applications, you might even weant a combination of both, the nosql database for some types of information, the transactional relational database for orders and other things where transactional consistency is a must.

If you develop web sites, I think you need to thoroughly understand both types of databases and the needs behind them before choosing how to handle any new functionality.

It seems to me that you have almost no knowledge of relational databases and would rather do what is easier for you personally than what is right for the project. Maybe I'm not reading that correectly, but anyone who never uses joins is suspect in terms of understanding relational databases.

You don't decide between these two based on which one seems easier to understand or which is athe buzzword of the month, you decide them based on the functionality you will need, not just for the user interface but for administrative tasks, reporting, financial or other types of data auditing, government regulation, data recovery in case of a hardware failure, etc.

HLGEM
+1  A: 

I gave a talk at OSCON about when NoSQL can be the right choice, and some of the different sub-categories to be aware of: http://assets.en.oreilly.com/1/event/45/The%20NoSQL%20Ecosystem%20Presentation.pdf

jbellis
@jbelis: "Relational databases don't scale", "Relational databases are slow" These are claims that could apply to certain DBMS products. They have nothing to do with the relational model. It would be quite reasonable to make a "NOSQL RDBMS" (ie. relational, not SQL) that didn't have the same perceived disadvantages. As I have often observed, NOSQL enthusiasts sometimes seem overly keen to throw out the relational baby with the SQL bathwater :)
dportas