views:

1045

answers:

12

I've always been fascinated by how these huge RDBMS servers can work for months, shuffling data all around without losing it, perform all kinds of queries and generally being very complex beasts.

I understand how much effort it takes to create even something as "simple" as SQLite, but I cannot resist the desire to try and develop my own RDBMS engine. Therefore, my question is: what books/online resources/etc. are there to this regard? I'd prefer more practice-oriented material rather than purely theoretical, but these will do as well.

I'd also prefer not to read SQLite/MySQL/PostgreSQL sources as this all are well beyond "simple" and are cluttered (being written in a fairly low-level language) with various technical and implementation details.

+8  A: 

This question has been asked before in some permutation. I've seen this book recommended, Database Systems: The Complete Book. I've not read it, but I did order it. It covers basic theory and has a section on implementation.

I'm in the same boat you are. I would love to know how to implement a database, but the landscape for even regular database material on the web is a bit thin, let alone implementing one.

Other answers include reading the documentation and source code to SQLite, as well as the source code to other open source DBs (PostgreSQL, MySQL).

To add, maybe just good old fashioned feature copying. From my short stint in databases, we hear so much about how they work at a superficial level. So maybe you can take those concepts pretty literally and come up with implementations yourself. Like how indexes are just tables or how transcations work or how things are logged first then enacted upon, etc. My guess is that finding tutorial-centric material for database implementation will be really, really tough. But I'm right there with you! Stack project!!!

Mark Canlas
I've read the book (class at the University). It's good and covers all you want to know, but very theoretical and not the easiest book to read.
Christian Vik
Database Systems: The Complete Books is really a good book.
dmeister
+1  A: 

Ok, got one more-or-less practical. Database Management Systems by Raghu Ramakrishnan & Johannes Gehrke.

Anton Gogolev
+1  A: 

I second Database Management Systems and also recommend Readings in Database Systems.

I have read both. Database Management Systems is quite excellent. Readings in Database Systems is also filled with valuable information but is not as accessible.

qstarin
+2  A: 

I highly recommend you this book:

Database in Depth: The Relational Model for Practitioners by C.J. Date

Alexey Kalmykov
+1  A: 

I have an odd recommendation for you.

There are plenty of database engines that take in SQL.

Maybe take a different path and build a RDBMS so that what is passed in is directly the execution plan. It would have interesting applications.

Joshua
I just want to point out that there are dbs that let you do this. MonetDB for example, see http://monetdb.cwi.nl/MonetDB/Documentation/MAL-Reference.html#MAL-Reference for the relational algebra. SQL and XQuery are added on top of this underlying, low-level relational algebra.
Roland Bouman
+11  A: 

For the theoretical part, I think that Edgar Frank "Ted" Codd (the father of Relational Databases) work is a must read (see Dr Codd at the C2 wiki too). The paper that started it all is "A Relational Model of Data for Large Shared Data Banks". Also check the work of Christopher J. Date and Hugh Darwen that have among others subsequently maintained and developed Codd's relational model. More precisely, look at The Third Manifesto and "An Introduction to Database Systems".

For the practical part, I'd warmly recommend to check Rel. Quoting the Rel page on Wikipedia:

Rel is an open source true relational database management system that implements a significant portion of Chris Date and Hugh Darwen's Tutorial D query language.

Primarily intended for teaching purposes, Rel is written in the Java programming language.

Rel might be much better for educational purpose than SQLite/MySQL/PostgreSQL in my opinion.

Pascal Thivent
+1 This is a very good advice! Also Christopher J. Date has a good book which is recommended in my answer :)
Alexey Kalmykov
+4  A: 

Developing a RDBMS can be a huge task but if you're doing this for learning purposes you don't have to go all the way. Simplify things. To begin with I wouldn't even created it ACID compliant.

If I were to create a RDBMS I would:

1) Create a simple storage engine. Simple tables and the most common datatypes. Skip implementing most indexes but the primary key index Skip transactions (to begin with) Skip optimizing database r/w

2) Query operators When you have libraries for storage I would implement the simple operators (projection, selection, cross join etc.)

3) Query parser When The simple operators had been implemented I would create a query parser for SQL. Not a fancy optimized one, but one that could parse simple SQL statements into query trees.

I wouldn't implement undo/redo logging, transaction support, odbc or any other "real" rdbms functionality in the first prototype/proof of concept solution.

Christian Vik
I might try to do transactions sooner rather than later - right after the non-transactional storage engine. I think adding transactions to a complex non-transactional system will be much harder than adding queries to a transactional system. This would also be the time to decide between locking and MVCC (or some other approach to transactions!). It all depends whether you're more interested in learning about transactions or queries, though.
Tom Anderson
+2  A: 

The Developer FAQ on the postgresql website has a few interesting books and links suggestions (below on the page) http://wiki.postgresql.org/wiki/Developer%5FFAQ#What%5Fbooks%5Fare%5Fgood%5Ffor%5Fdevelopers.3F

Nip
+1  A: 

Fundamentals of Database Systems by Elmasri & Navathe is a good one.

NA
+2  A: 

Apart from the books, i would strongly suggest you to take a look at the open source C# port of sqlite. Incase If you are a c# developer

Ramesh Vel
+2  A: 

If you are looking for source code, then Relational Database Management by Papazoglou and Valder contains a complete RDBMS, from B*-trees to language parser, implemented in (old-style) C. There's also a lot of explanatory text.

anon
+1  A: 

The H2 database is an open source full-featured RDBMS written in Java. The source code might provide valuable insights regarding the inner workings of a database.

Todd Stout