views:

1633

answers:

7

Apart from the google/bigtable scenario, when shouldn't you use a relational database? Why not, and what should you use? (did you learn 'the hard way'?)

+10  A: 

I suggest you visit the High Scalability blog, which discusses this topic almost on a daily basis and has many articles about projects that chose distributed hashes, etc. over RDMBS.

The quick (but very incomplete answer) is that not all data translates well to tables in efficient ways. For example, if your data is essentially one big dictionary, there are probably much faster alternatives that plain old RDBMS. Having said that, it mostly a matter of performance, and if performance isn't a huge concern in a project, and stability, consistency and reliability, for example, are, then I don't see much point in delving into these technologies when RDBMS is a much more mature and well developed scheme, with support in all languages and platforms and a huge set of solutions to choose from.

Assaf Lavie
+7  A: 

Fifteen years ago I was working on a credit risk system (basically a big tree walking system). We were using Sybase on HPUX & solaris and performnce was killing us. We hired in consultants direct from Sybase who said it couldn't be done. Then we switched to an OO database (Object store in this case) and got a about a 100x performance increase (and the code was about 100x easier to write too)

But such situations are quite rare - a relational database is a good first choice.

anon
+8  A: 

The relational database paradigm makes some assumptions about usage of data.

  • A relation consists of an unordered set of rows.
  • All rows in a relation have the same set of columns.
  • Each column has a fixed name and data type and semantic meaning on all rows.
  • Rows in a relation are identified by unique values in primary key column(s).
  • etc.

These assumptions support simplicity and structure, at the cost of some flexibility. Not all data management tasks fit into this kind of structure. Entities with complex attributes or variable attributes do not, for instance. If you need flexibility in areas where a relational database solution doesn't support it, you need to use a different kind of solution.

There are other solutions for managing data with different requirements. Semantic Web technology, for example, allows each entity to define its own attributes and to be self-describing, by treating metadata as attributes just like data. This is more flexible than the structure imposed by a relational database, but that flexibility comes with a cost of its own.

Overall, you should use the right tool for each job.

See also my other answer to "The Next-gen databases."

Bill Karwin
+1 for the specifics of relational database paradigm assumptions. I think most beginner-to-intermediate devs (like me) forget that it was designed with assumptions and just don't remember it may not be the best way. In what types of systems would you come across the need for more flexibility?
J M
@JM: It *is* the best way if you need the database to enforce a consistent set of attributes on all entities in a given relation. You'd need more flexibility if you have a collection of entities with variable attributes, e.g. a product catalog with many different types of products.
Bill Karwin
I really like this answer. I'm so tired of hearing "a RDBMS can model anything" in discussions, but that's not what really matters. It's about the assumptions of the relational database paradigm and if these make a good fit for the problem at hand or not.
nawroth
@nawroth: Yep! You don't use a screwdriver to drive in a nail, and you don't use a hammer to drive in a screw. Maybe it's *possible* to do either of those things, given enough determination and patience. But it'd be easier, more efficient, and more successful if you use the right tools.
Bill Karwin
@Bill, hm... iirc these 'assumptions' were deliberate; each of them was a safeguard against polution of the data model and driving in the direction of an actual relational database (which we don't have RDBMS are not really relational, just relational-like). Today, you are right in a sense that RDBMS did not deliver a clean data management solution and that are other valid approaches (espicially it terms of speed, flexibility and getting the job done); however I would not really care to use any other data model for enterprise wide data management (for example modeling an ERP for big company).
Unreason
And that's (theoretically) the(!) strongest point of relational systems (in my book) - that they scale and integrate. As you computerize parts of a business process, you tend to aim to get to as homogoneus platform as possible. Getting the right tool is a good idea, but what is the criteria: system speed? maintenance/system costs? simplicity?
Unreason
@Unreason: Exactly. If you use a NoSQL solution, you usually have to write a lot of your own code to duplicate data integrity enforcement that an RDBMS would do for you.
Bill Karwin
+18  A: 

In my experience, you shouldn't use a relational database when any one of these criteria are true:

  • your data is structured as a hierarchy or a graph (network),
  • the typical access pattern emphasizes reading over writing, or
  • there’s no requirement for ad-hoc queries.

Hierarchies and graphs do not translate well to relational tables. Even with the assistance of proprietary extensions like Oracle's CONNECT BY, chasing down trees is a mighty pain using SQL.

Relational databases add a lot of overhead for simple read access. Transactional and referential integrity are powerful, but overkill for some applications. So for read-mostly applications, a file metaphor is good enough.

Finally, you simply don’t need a relational database with its full-blown query language if there are no unexpected queries anticipated. If there are no suits asking questions like "how many 5%-discounted blue widgets did we sell in on the east coast grouped by salesperson?", and there never will be, then you, sir, can live free of DB.

yukondude
+1 for specifics. What about the scenario where part of your data is hierarchical e.g. a sales/stock database where there are customers, products, orders, etc, tens of thousands of products with tens of thousands of product categories which are hierarchical?
J M
If the hierarchy is more wide than deep, then a relational DB could still be a reasonable choice. If the maximum depth is fixed, then you can always denormalize and flatten the hierarchy (not very pretty, though).
yukondude
Wouldn't nested sets work pretty well even in relational databases? http://en.wikipedia.org/wiki/Nested_set_model
Henrik Paul
There's nothing conflicting about a hierarchy. That's exactly what JOINs with 1:m relationships are. And why shouldn't you use an RDBMS just because you emphasize reading over writing? That's 99% of web sites. Ditto for "no ad hoc queries". This answer is just plain wrong. All three points are wrong. And it didn't even provide any suggested alternatives as requested. And it gets 10 votes plus accepted? Looks like a setup question to me.
le dorfier
Great answer dude !
majkinetor
le dorfier: 1. Hierarchies are 1:m *reflexive* relationships, which are easy enough to JOIN with to find the next level, but not for joins to arbitrary depths. 2. True, most read-only websites use RDBMSs, but again, referential integrity and transactional consistency are not nearly as useful for read-only use. 3. Ad-hoc queries are the reason relational theory exists--review your E.F. Codd. 4. Sorry, not a setup. In fact, I am a great believer in the power of RDBMSs, and teach courses in using them, but one has to grasp the limitations in any technology.
yukondude
@le dorfier - Just because "all the other web sites are doing it" doesn't mean it's optimal. I bet 99% of the 99% you were mentioning use an RDBMS because they don't know anything else.
Travis Heseman
@J M - Regarding products/categories I wrote a blog post on that yesterday: http://blog.neo4j.org/2010/03/modeling-categories-in-graph-database.html
nawroth
+1  A: 
Keyframe
+5  A: 

When you schema varies a lot you will have a hard time with relational databases. This is where XML databases or key-value pair databases work best. or you could use IBM DB2 and have both relational data and XML data managed by a single database engine. Get it free - check http://FreeDB2.com.

Leon Katsnelson
Do you have any real world examples of when you may be in this situation to help less experienced developers (meaning me) get a feel for when this kind of issue may crop up?
J M
+5  A: 

There are three main data models (C.J.Date, E.F.Codd) and I am adding a flat file to this:

  • flat file(s) (structure varies - from 'stupid' flat text to files conforming to grammars which coupled with clever tools do very clever things, think compilers and what they can do, narrow application in modelling new things)
  • hierarchical (trees, nested sets - examples: xml and other markup languages, registry, organizational charts, etc; anything can be modelled, but integrity rules are not easy to express and retrieval is hard to optimize automatically, some retrieval is fast and some is very slow )
  • network (networks, graphs - examples: navigational databases, hyperlinks, semantic web, again almost anything can be modelled but automatic optimizing of retrieval is a problem)
  • relational (first order predicate logic - example: relational databases, automatic optimization of retrieval)

Both hierarchical and network can be represented in relational and relational can be expressed in the other two.

The reason that relational is considered 'better' is the declarative nature and standardization on not only the data retrieval language but also on the data definition language, including the strong declarative data integrity, backed up with stable, scalable, multi-user management system.

Benefits come at a cost, which most projects find to be a good ratio for systems (multi application) that store long term data in a from that will be usable in foreseeable future.

If you are not building a system, but a single application, perhaps for a single user, and you are fairly certain that you will not want multiple applications using your data, nor multiple users, any time soon then you'll probably find faster approaches.

Also if you don't know what kind of data you want to store and how to model it then relational model strengths are wasted on it.

Or if you simply don't care about integrity of your data that much (which can be fine).

All data structures are optimized for a certain kind of use, only relational if properly modelled tries to represent the 'reality' in semantically unbiased way. People who had bad experience with relational databases usually don't realize that their experience would have been much worse with other types of data models. Horrible implementations are possible, and especially with relational databases, where it is relatively easy to build complex models, you could end up with quite a monster on your hands. Still I always feel better when I try to imagine the same monster in xml.

One example of how good relational model is, IMO, is ratio of complexity vs shortness of the questions that you will find that involve SQL.

Unreason