+15  Q: 

moving to NoSql

Hi , I recently read this article nosql-vs-rdbms and I don't know too much about nosql and I didn't use it in my projects, so I have some questions:

What is the main feature that nosql has over RDBMSs?

If you think that it is better than RDBMSs : Where and how can I learn about it (books - tutorials)?

I want to be a DBA. What this career will be after moving to nosql?


+11  A: 

Good question. I have heard and read a lot about NoSQL vs SQL, mostly from the NoSQL side.

It looks a lot like the key-value databases that were there from a long time ago. It also looks like the Object-Oriented databases of the 1990s which were supposed to come and replace SQL.

NoSQL has an advantage with respect to the CAP Theorem, which states that nice things about databases are consistency, availability, and partition tolerance, but you can only pick 2 of the 3. Relational databases give you consistency and availability, and a lot of popular NoSQL databases give you availability and partition tolerance. i.e. You can distribute data across many computers easily with NoSQL. So if you have an application (like a lot of Google's applications) that need to scale to a gajillion users, NoSQL is a better choice.

I think some of the advocacy of NoSQL comes from people who have gotten applications working quickly with NoSQL, and ask, "...relational databases? we don't need no stinkin' relational databases!". NoSQL seems to be the way to go at Google. A friend of mine worked there for a while, and his comment was that Google folks advocate using NoSQL, but it requires a lot of code to dipsy-doodle around data in complex relationships. Problem domains with lots of joins and indices are harder to code with NoSQL plus code vs just SQL. This works for Google because they aggressively recruit prolific coders. The other thing is that a lot of Google applications boil down to being huge lists of stuff that can be scattered across multiple machines. They achieve good query speed with their search indices, Google File System, and Map-Reduce. Joins are not as much of a problem in those applications.

This video at YouTube talks about NoSQL vs SQL. It's kind of funny if you are a SQL advocate, but it describes how you solve problems in NoSQL that were solved in SQL relational databases.

Jay Godse
You know, that "aggressively recruit prolific coders" thing is frightening at some level.
+1  A: 

One of the things that's easy to overlook is that technologies are often associated with a specific development process. For example, say you are hired at a company that has a settled team of database administrators. They proudly guard a deliberate process of deploying and testing SQL.

Now you've been given two months to implement a new procurement website, and there's just no way you could get there with the DBA team. So you start looking for a way around the DBA team. NoSQL can provide that.

Being an alternative to SQL, while doing 60% of the things SQL does, is a powerful feature :)

+3  A: 

It's rather like comparing bananas to mushrooms.

There are very few kinds of bananas, very closely related, everyone knows what they taste like and they all peel the same way. If you try to describe a kind of banana to someone without using the word banana, they'll probably understand and relate it to the banana they know pretty quickly.

There are a wide variety of mushrooms, they all taste different, and you prepare them differently. If you try to describe a kind of mushroom to someone without using the word mushroom, they may have no idea what you are talking about and try to make a broccoli dish with it.

The relational model that almost all RDBMS conform to is fairly consistent, and the notion of referential integrity, ACID, constraints, relations (i.e. tables) and normal forms are well understood. Modelling the data is typically important. The idea of what a database is is well-defined and the boundary of the database's responsibility is well-defined. Data is king.

As far as NoSQL, about the only thing consistent with their various models is distributed processing, scalability and lots more code and not much of a unified query engine. They are really databases in only the vaguest sense of an organized collection of data, like a folder of Excel spreadsheets is a database. Rules can be in code or not or whatever. Code is king.

The problems that NoSQL systems are designed to solve are not the same problems that the relational model solves, it's horses for courses.

Cade Roux
+9  A: 

The original name of this technology before people started calling it "NoSQL" was a distributed key/value store. This is a far more descriptive name, and I originally remember looking at it and going "hey, cool, I'll bet that will end up being very useful to a lot of people." The term has since expanded to essentially include "anything that isn't a relational database", but usually, when most people talk about NoSQL, they are talking about key/value stores.

Ever since the term NoSQL was coined, it's been getting touted as a silver bullet. I'm interested in products like Cassandra and follow their progress, but they are still immature technologies, and to claim that they are "replacing" SQL or RDBMSes in general (or that they will in the near future) is specious reasoning at best, if not an out-and-out lie.

Products and technologies fitting under the NoSQL umbrella are geared toward the following problem domain:

  • You plan to deploy a large-scale, high-concurrency database (hundreds of GB, thousands of users);
  • Which doesn't need ACID guarantees;
  • Or relationships or constraints;
  • Stores a fairly narrow set of data (the equivalent of 5-10 tables in SQL);
  • Will be running on commodity hardware (i.e. Amazon EC2);
  • Needs to be implemented on a very low budget and "scaled out."

This actually describes a lot of web sites today. Google and Twitter fit very neatly into these requirements. Does it really matter if a few tweets are lost or delayed? On the other hand, these specs apply to nearly 0% of business systems, which is what a very high number of us work on developing. Most businesses have very different requirements:

  • Medium-to-large-scale databases (10-100 GB) with fairly low concurrency (hundreds of users at most);
  • ACID (especially the A and C - Atomicity and Consistency) is a hard requirement;
  • Data is highly correlated (hierarchies, master-detail, histories);
  • Has to store a wide assortment of data - hundreds or thousands of tables are not uncommon in a normalized schema (more for denormalization tables, data warehouses, etc.);
  • Run on high-end hardware;
  • Lots of capital available (if your business has millions of customers then you can probably find $25k or so lying behind the couch).

High-end SQL databases (SQL Server, Oracle, Teradata, Vertica, etc.) are designed for vertical scaling, they like being on machines with lots and lots of memory, fast I/O through SANs and SSDs, and the occasional horizontal scaling through clustering (HA) and partitioning (HC).

"NoSQL" is often compared favourably to "SQL" in performance terms. But fully maxed-out, a high-end SQL database server or cluster will scale almost infinitely. That is how they were intended to be deployed. Beware of dubious benchmarks comparing poorly-normalized, poorly-indexed SQL databases running mysql on entry-level servers (or worse, cloud servers like Amazon EC2) to similarly-deployed NoSQL databases. Apples and oranges. If you work with SQL, don't be scared by that hype.

SQL isn't going anywhere. DBAs are no more likely to vanish as a result of NoSQL than PHP programmers were as a result of Java and XML.

NoSQL isn't going anywhere either, because the development community has correctly recognized that RDBMSes aren't always the optimal solution to every problem.

So, as a developer you owe it to yourself to at least learn what NoSQL is, what products it refers to (Cassandra, BigTable, Voldemort, db4o, etc.), and how to build and code against a simple database created with one of these. But don't start throwing away all your SQL databases yet or thinking that your career is going to be made obsolete - that's hype, not reality.

+1  A: 

Try MongoDB. You can find drivers and a tutorial for Java here

+2  A: 

NoSQL does not refer to any single type of database system, but rather to any type of database system which is not relational. Asking for the "most simple nosql engine" is equivalent of asking for the "most simple instrument which is not a guitar". No single definitive answer exists.

First, you will need to ask yourself why a relational database is not optimal for the problem you are trying to solve. Then, use that information to decide amongst the many different kind of alternative (NoSQL) database systems available:

  • Document store
  • Graph databases
  • Key/value stores
  • Eventually consistent key/value stores
  • Object database
  • etc.

The NoSQL article on Wikipedia and both seems to have comprehensive lists of popular NoSQL database implementations. If you are merely looking to investigate some of the different systems, I would suggest having a look at NHbase, Cassandra and neo4j.

Jørn Schou-Rode

Take a look at Db4o, an object database. I have used this briefly for a .NET project and it is easy to get started with. It is also available for Java too.

This blog post does a nice job of explaining the motivation behind using an object-oriented DBMS rather than a RDBMS.

+1  A: 

I made a Visual Guide to NoSQL Systems to quickly see the major trade-offs involved in choosing one. The biggest choice is picking two of the following: consistency, availability, and partition tolerance.

Nathan Hurst

berkeley db is pretty nice too