I'm learning traditional Relational Databases (with PostgreSQL) and doing some research I've come across some new types of databases. CouchDB, Drizzle, and Scalaris to name a few, what is going to be the next database technologies to deal with?
Not to be pedantic, but I would like to point out that at least CouchDB isn't SQL-based. And I would hope that the next-gen SQL would make SQL a lot less... fugly and non-intuitive.
SQL has been around since the early 1970s so I don't think that it's going to go away any time soon.
Maybe the 'new(-ish) sql' will oql (see http://en.wikipedia.org/wiki/ODBMS)
I would say next-gen database, not next-gen SQL.
SQL is a language for querying and manipulating relational databases. SQL is dictated by an international standard. While the standard is revised, it seems to always work within the relational database paradigm.
Here are a few new data storage technologies that are getting attention currently:
- CouchDB is a non-relational database. They call it a document-oriented database.
- Amazon SimpleDB is also a non-relational database accessed in a distributed manner through a web service. Amazon also has a distributed key-value store called Dynamo, which powers some of its S3 services.
- Dynomite and Kai are open source solutions inspired by Amazon Dynamo.
- BigTable is a proprietary data storage solution used by Google, and implemented using their Google File System technology. Google's MapReduce framework uses BigTable.
- Hadoop is an open-source technology inspired by Google's MapReduce, and serving a similar need, to distribute the work of very large scale data stores.
- Scalaris is a distributed transactional key/value store. Also not relational, and does not use SQL. It's a research project from the Zuse Institute in Berlin, Germany.
- RDF is a standard for storing semantic data, in which data and metadata are interchangeable. It has its own query language SPARQL, which resembles SQL superficially, but is actually totally different.
- Vertica is a highly scalable column-oriented analytic database designed for distributed (grid) architecture. It does claim to be relational and SQL-compliant. It can be used through Amazon's Elastic Compute Cloud.
- Greenplum is a high-scale data warehousing DBMS, which implements both MapReduce and SQL.
- XML isn't a DBMS at all, it's an interchange format. But some DBMS products work with data in XML format.
- ODBMS, or Object Databases, are for managing complex data. There don't seem to be any dominant ODBMS products in the mainstream, perhaps because of lack of standardization. Standard SQL is gradually gaining some OO features (e.g. extensible data types and tables).
- Drizzle is a relational database, drawing a lot of its code from MySQL. It includes various architectural changes designed to manage data in a scalable "cloud computing" system architecture. Presumably it will continue to use standard SQL with some MySQL enhancements.
- Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store, developed at Facebook by one of the authors of Amazon Dynamo, and contributed to the Apache project.
- Project Voldemort is a non-relational, distributed, key-value storage system. It is used at LinkedIn.com
- Berkeley DB deserves some mention too. It's not "next-gen" because it dates back to the early 1990's. It's a popular key-value store that is easy to embed in a variety of applications. The technology is currently owned by Oracle Corp.
Also see this nice article by Richard Jones: "Anti-RDBMS: A list of distributed key-value stores." He goes into more detail describing some of these technologies.
Relational databases have weaknesses, to be sure. People have been arguing that they don't handle all data modeling requirements since the day it was first introduced.
Year after year, researchers come up with new ways of managing data to satisfy special requirements: either requirements to handle data relationships that don't fit into the relational model, or else requirements of high-scale volume or speed that demand data processing be done on distributed collections of servers, instead of central database servers.
Even though these advanced technologies do great things to solve the specialized problem they were designed for, relational databases are still a good general-purpose solution for most business needs. SQL isn't going away.
update 2010-09-26: I've written an article in php|Architect magazine about the innovation of non-relational databases, and data modeling in relational vs. non-relational databases. http://www.phparch.com/magazine/2010/september/
There are special databases for XML like MarkLogic and Berkeley XMLDB. They can index xml-docs and one can query them with XQuery. I expect JSON databases, maybe they already exist. Did some googling but couldn't find one.
I'm missing graph databases in the answers so far. A graph or network of objects is common in programming and can be useful in databases as well. It can handle semi-structured and interconnected information in an efficient way. Among the areas where graph databases have gained a lot of interest are semantic web and bioinformatics. RDF was mentioned, and it is in fact a language that represents a graph. Here's some pointers to what's happening in the graph database area:
- Graphs - a better database abstraction
- Graphd, the backend of Freebase
- Neo4j open source graph database engine
- AllegroGraph RDFstore
- Graphdb abstraction layer for bioinformatics
- Graphdb behind Directed Edge recommendation engine
I'm part of the Neo4j project, which is written in Java but has bindings to Python, Ruby and Scala as well. Some people use it with Clojure or Groovy/Grails. There is also a GUI tool evolving.
I heard also about NimbusDB by Jim Starkey
Jim Starkey is the man who "create" Interbase
who work on Vulcan (a Firebird fork)
and who was at the begining of Falcon for MySQL
For a look into what academic research is being done in the area of next gen databases take a look at this: http://www.thethirdmanifesto.com/
In regard to the SQL language as a proper implementation of the relational model, I quote from wikipedia, "SQL, initially pushed as the standard language for relational databases, deviates from the relational model in several places. The current ISO SQL standard doesn't mention the relational model or use relational terms or concepts. However, it is possible to create a database conforming to the relational model using SQL if one does not use certain SQL features."
http://en.wikipedia.org/wiki/Relational_model (Referenced in the section "SQL and the relational model" on March 28, 2010