Before E F Codd published his paper "A Relational Model of Data for Large Shared Data Banks" in 1970, hierarchal and network were the two prominent models of the database.
What exactly was wrong with them that they did not prevail?
Before E F Codd published his paper "A Relational Model of Data for Large Shared Data Banks" in 1970, hierarchal and network were the two prominent models of the database.
What exactly was wrong with them that they did not prevail?
The basic problem was the inability to support ad hoc queries. These databases were very fast, but only if you queried them in the ways their original designers expected you to. If you came up with another type of query, they could either be very slow or at worst require that the database schema be changed to support the query.
I actually worked on both kinds of these in the 80s (Codasyl and Nomad/2) and was very glad when SQL became more widely available.
What was good in these models was performance, and that's what I think made it take so long for RDBMS to become dominant (they performed really bad in the beginning).
If you want to dig deeper into history this interview with Charles Bachman is highly recommended reading! He's an interesting person as well, actually he coded the first automated data modeling tool for RDBMS!
BTW, hierarchical/network databases are still in use at least in mainframe settings.
The answers so far cover a lot of the practical reasons why the network and hierarchical models were eventually displaced by the relational model (including SQL database systems). Codd's 1970 paper explains why a new model is needed, in detail. It's a great read. Indeed, before Codd, the term "data model" was practically unheard of. So he coined the terms "hierachical model" and "network model" in order to describe database systems that had been constructed with no precise model in mind.
The hierarchical and network models can be collected into a general term, called the "graph model". The essential feature of the graph model of data is that data items are referenced by stating their location. If you understand pointers, you understand everything fundamental about the graph model.
There are two very powerful advantages to the graph model of data. The first is that it's very easy for programmers to grasp. Novice programmers go through a certain learning curve coming to grips with pointers, but once they've done that, they are ready to understand graph data easily.
The second advantage is that pointers are extremely fast, provided that the navigation path to be followed was anticipated at the time the data was written.
There are several disadvantages to using pointers to identify data. One is that the data becomes "pinned". That is, when the data is to be shuffled all of the pointers that reference the data have to be located and updated. Or a "forwarding address" has to be left at the old location. If you've ever been in the web and clicked on a button that has always worked, only to be greeted with the infamous "page not found" error, you've probably come across the pitfall of shuffling pinned data without updating references to it.
A second one is that navigating data along unplanned access paths can be downright disastrous, both in terms of performance, and in terms of logical correctness. This is one of the reasons why ad hoc reporting is so difficult with graph databases.
A third drawback of graph data is that there may be logical relationships in the graph data that are not inherent in the data as given. The fundamental advantage of the relational model is that all the relationships are inherent in the data itself. The reason why this is an advantage is complex. I refer you again to the 1970 paper.
In all the "relational DBMSes" that you and I are likely to use, there is a bridge between using data to identify data and using pointers to locate data. It's called an index. The index relates two items: an index key (one or more columns from a table), and a pointer (that locates a row containing the index key). I'm glossing over all the details about indexes.
Anyway an index allows the SQL engine to translate a query that states what data is being sought into where to look for that data. Data that is pointed to by indexes can still be shuffled, but the index has to be rebuilt as part of the process.
This is an overview.
Navigation. The hierarchical and network models depend on navigational structures (aka pointers / links / graphs) in the database. Their functionality is therefore constrained by the design of those structures. In contrast, the relational model "provides a means of describing data with its natural structure only--that is, without superimposing any additional structure for machine representation purposes."[1]
Ironically, the current "NOSQL" trend in databases also embraces navigational structures, often viewing them (quite mistakenly in my view) as a good solution to the perceived limitations of SQL databases.
[1] "A Relational Model of Data Large Shared Data Banks" E. F. CODD, 1970