The original name of this technology before people started calling it "NoSQL" was a distributed key/value store. This is a far more descriptive name, and I originally remember looking at it and going "hey, cool, I'll bet that will end up being very useful to a lot of people." The term has since expanded to essentially include "anything that isn't a relational database", but usually, when most people talk about NoSQL, they are talking about key/value stores.
Ever since the term NoSQL was coined, it's been getting touted as a silver bullet. I'm interested in products like Cassandra and follow their progress, but they are still immature technologies, and to claim that they are "replacing" SQL or RDBMSes in general (or that they will in the near future) is specious reasoning at best, if not an out-and-out lie.
Products and technologies fitting under the NoSQL umbrella are geared toward the following problem domain:
- You plan to deploy a large-scale, high-concurrency database (hundreds of GB, thousands of users);
- Which doesn't need ACID guarantees;
- Or relationships or constraints;
- Stores a fairly narrow set of data (the equivalent of 5-10 tables in SQL);
- Will be running on commodity hardware (i.e. Amazon EC2);
- Needs to be implemented on a very low budget and "scaled out."
This actually describes a lot of web sites today. Google and Twitter fit very neatly into these requirements. Does it really matter if a few tweets are lost or delayed? On the other hand, these specs apply to nearly 0% of business systems, which is what a very high number of us work on developing. Most businesses have very different requirements:
- Medium-to-large-scale databases (10-100 GB) with fairly low concurrency (hundreds of users at most);
- ACID (especially the A and C - Atomicity and Consistency) is a hard requirement;
- Data is highly correlated (hierarchies, master-detail, histories);
- Has to store a wide assortment of data - hundreds or thousands of tables are not uncommon in a normalized schema (more for denormalization tables, data warehouses, etc.);
- Run on high-end hardware;
- Lots of capital available (if your business has millions of customers then you can probably find $25k or so lying behind the couch).
High-end SQL databases (SQL Server, Oracle, Teradata, Vertica, etc.) are designed for vertical scaling, they like being on machines with lots and lots of memory, fast I/O through SANs and SSDs, and the occasional horizontal scaling through clustering (HA) and partitioning (HC).
"NoSQL" is often compared favourably to "SQL" in performance terms. But fully maxed-out, a high-end SQL database server or cluster will scale almost infinitely. That is how they were intended to be deployed. Beware of dubious benchmarks comparing poorly-normalized, poorly-indexed SQL databases running mysql on entry-level servers (or worse, cloud servers like Amazon EC2) to similarly-deployed NoSQL databases. Apples and oranges. If you work with SQL, don't be scared by that hype.
SQL isn't going anywhere. DBAs are no more likely to vanish as a result of NoSQL than PHP programmers were as a result of Java and XML.
NoSQL isn't going anywhere either, because the development community has correctly recognized that RDBMSes aren't always the optimal solution to every problem.
So, as a developer you owe it to yourself to at least learn what NoSQL is, what products it refers to (Cassandra, BigTable, Voldemort, db4o, etc.), and how to build and code against a simple database created with one of these. But don't start throwing away all your SQL databases yet or thinking that your career is going to be made obsolete - that's hype, not reality.