Objects oriented databases seem like a really cool idea to me, no need to worry about mapping your domain model to your database model, no messing around with sql or ORM tools. The way I understand it, relational DBs offer some advantages when there is massive amounts of data, and searching an indexing need to be done. To my mind 99% of websites are not massive, and enterprise issues never need to be thought about, so why arn't OO DBs more widely used?
I don't know much about OO databases (beyond when I heard them mentioned in the SO podcast and what I just read here), but it seems to me that if you have issues with massive amounts of data in an OO DB, I would be leery of using one for any site that I thought was going to grow to any sort of large size (like this one) unless it was a straight forward process to migrate over to a relational database when the OO DB started slogging down to a snails pace because of data overload...
Plus, small DBs are so easy to develop and maintain that it just doesn't seem worth it to learn OO DB stuff just for a little DB. So, if they are easier to build/design for large DBs, but too slow to use with a large DB, then they are self defeating because the very kind of DB you might be motivated to use them for is the very DB they should be avoided for.
Then again, I know next to nothing about OO DB, so I could be wrong about everything I just said. I would love to hear from someone who has actually used them who could set me straight on my (probable) misconceptions.
I wish there was some way to mark this as a comment rather than an answer...
I too don't have any experience with OODB's, but I'd be very wary about throwing one out, even for a small project.
2 reasons: You may (and if your project grows, will) need to access your database from things other than your main app. This includes things like reporting services, replication, so on and so forth.
Also, for most projects, the database isn't the bottleneck, or point of difference. It's just a place to shove your data, so there is really no need to worry about it. Just take the path of least resistance (SQL) and focus your efforts on the actual core features of your app or site.
This might sound like a stupid answer but... They're not widely used, because they're not widely used - I think that the issue with people not using OODBs is that more people feel comfortable with SQL etc. (because they already know it well, and because they know that everyone uses it)
My short answer, because people are more familiar with relational databases and if they work, is there a need to learn something new? I'm not saying that's a good thing but at University I was not taught OODBs, I've seen at least 5 times as much on the web about Relational than I have about OODBs.
To my mind 99% of websites are not massive, and enterprise issues never need to be thought about, so why arn't OO DBs more widely used?
I think you give the answer yourself. If you have a very small app, no massive data - so no need to think about performance - than you can as well use an ORM tool. It's easy to set up a relational database, with standard configuration and get started using hibernate or something.
In the vein of portability, I'd use relational. Moving a website from one host to another would be a nightmare with some niche DBMS. Almost all hosting providers offer MSSQL and MySQL these days.
Why we abandoned Object Databases
A while back, I was part of a Solaris/C++ project that used the Objectivity object database. We eventually switched the project over to Sybase. This was about 10 years ago, so I'm sure a lot has changed since then, but a few of the observations still apply.
The application was a carrier-class telecom system.
- The basic functionality was nice. You reference an object and it magically pops into memory. If your programming problem boils down to having a large graph of objects (as opposed to tabular data) it is a definite win.
- There were some stability-related "early adopter" problems. Seeing as the company is still in business it's safe to assume they are fixed.
- We wanted to allow the client to define their own schema. This was a huge problem, since you give the object database a specification and it spits out a header file which defines your object. This is definitely not conducive to post-compiling schema definition, which is a definite strength of SQL databases.
- The customer perceived it as being a new, experimental technology. If you're familiar with the telecom world, you know that's not a recommendation.
- SQL databases have tons of tools for working with schemas, database administration, UI generation, backup, etc. We were having to write every little piece of that ourselves.
- I had a bad experience working with some of the early-adopter objectologists in that group. If you're familiar with the DailyWTF, these were the True,False,FileNotFound guys.
While attempting to address the client customization part, our lead objectologist came up with some data classes called Rows and Columns. That was when it became apparent that for our particular application we were just a whole lot better off going with a relational database.
Anyways, that was my experience with Object DBs, I would love to hear some other (hopefully happier!) experiences with them.
OODBs are not widely used because they are really slow when compared to relational DB's.
Recently I discovered and studied about Object-Relational Databases (ORDBs), which use good points from both worlds. On your database you can create objects with methods and atributes, just like you do on your favorite programming language, and then you create tables which are made only by those objects, with SQL commands like...
CREATE TABLE of TArtist;
... or you can use those objects as a column of an old-school relational table, with SQL commands like...
CREATE TABLE Records ( id number(5), name varchar2(40), singer TArtist );
The best part of it is to create select queries without dozens of joins. Here's an example of a query on the table above:
SELECT r.name, r.singer.name from Records r;
It can be a bit tricky to create 1-N or N-N references between tables, but the gain of simplicity when making selects may be worth it. Anyway, you still have to create code to map objects from and to the database, just as you do now. I plan to begin using it on personal projects, because I think it will be an important topic in a mid-term future.
Two very known databases that have ORDB capabilities are Oracle and Postgre.
Relational databases are proven technology that works and scales and does everything required of it. Objects are great for defining and using within the software, that doesn't mean they work well as a container for data. Relational databases are also getting easier to use, Microsoft is making a valid attempt with LINQ.
It's a not broke, don't fix it situation. :-)
I worked on a project using an OO database. It had it's perks but the biggest downside, besides performance and poorly documented API, was that it was impossible to see what data was actually stored in the database without writing code. All the developers would've killed for:
SHOW TABLES;
SHOW COLUMNS FROM table;
SELECT * FROM table;
In the end, we scrapped that platform for another.
Things that are initially easy usually grow into big problems. (Great Dane puppies grow in to very big dogs.) A good program design that cannot scale is usually not a good idea.
E.F. (Ted) Codd the genius that invented the idea of a relational database, in 1970, published a paper called "A Relational Model of Data for Large Shared Data Banks". He introduced the concept of a table (he called it a relation originally) for storing data. He also pioneered the idea of having a query language to be able to quickly pull the data out.
What is relevant to the object database model is that the reason the relational structure was invented is to get away from hierarchical databases. The old IMS model (IBMs hierarchical DB) suffered from many of the same problems that Object databases do. It is based around the programming environment and ease for the programmer. Which does sound good at first, but data is usually reused by reports as well as other systems. The relational model focuses on the structure of the data and access of the data, not the program using the data.
To make it the most painfully obvious on reuse, try reporting off of hierarchical or object database data. It is a simple test of the accessibility of the data and shows its flaws.
Don’t look for short cuts to good design. Learn OO, learn SQL and relational database design, and learn structured programming. You will gain more than experience, you will develop good judgment.
The reason object relational databases are not very widely used is due to the object-relational impedance mismatch. The two paradigms just don't quite fit . Read more about this on Wikipedia
I worked with OODBs a few years ago and...
- They apply very well to niche situations. Those are generally caching type applications and complex objects for smaller sets of data (think 911 dispatcher type application)
- They do not handle schema changes well at all. You pretty much have to write code to mutate each object. Imagine writing objects that serialize to disk as binaries - if you change your class you lose the ability to load old files. Object databases have the same problem
- You can't 'report' off object databases in a traditional way. Every report has to be a program that loads the objects and does something with them so there is no concept of 'list all where...'
Object Oriented databases are good at storing objects, and obviously allow objects to be related to each other in simple ways. But in a real world application the interesting stuff is about the relationships between the data; where the results of the query relate tables in interesting ways that go beyond that attibutes of a single object. I think many ORMs / Object Oriented databases don't handle this kind of data mashup very efficiently.
I find the following arguments flawed:
- I've used such and such OODB and it was inferior- thuse every OODB is inferior
- Use RDB as everyone uses RDB - millions of Chinese can't be wrong
- Object databases are proven inferior (slow, buggy, can't handle massive data) - by anonymous authority
- I don't know much about OODB - they must be inferior
There was a time when RDBs faced the same criticism and there are still COBOL dbs around. Nothing in IT is a silver bullet and you should chose your tools wisely. Technologies change and every once in a while something better is produced. Try things out.
A database containing simply data is easier to understand and manipulate than a database containing code and data.
Your code is going to be buggy. If you are using a relational database and you fix a bug, and the patch does not touch the schema, then you can just deploy the patch without touching the database, and you're done. Lots of fixes can be done this way, and they are fast and painless and usually fairly low risk. This is nice and productive, so you end up designing your systems to maximise the chance that bugs can be fixed without touching the schema, typically by thinking carefully about the schema. A bunch of CREATE TABLE and CREATE VIEW statements is a concise way to describe a data structure, and there are lots of programmers out there who can efficeintly grok a relational schema.
If you have to fix a bug that changes the schema, then you have to take a deep breath and migrate the production database. People learn how to do migrations with relational databases and it is manageable, since you are simply updating data tables and the migration process does not have to worry about code versioning. Anything that makes the migration more complex is going to be complicate an already risky situation.
If, on the other hand, you are using an object database, then a high proportion of your bug fixes are going to alter the contents of the database, and so you're doing the risky stressful database migrations much more frequently.
Object models vary in important details for different platforms. Therefore, using an object database will tend to tie the database access code to one specific platform. The popular relational databases can be accessed by just about anything.
"The Large Hadron Collider at CERN in Switzerland uses an Objectivity DB. The database is currently being tested in the hundreds of terabytes at data rates up to 35 MB/second."
ie: The Single Most expensive experiment in Human history uses a OODBMS.
Maturity and Simplicity are the two main factors in the lack of OODBMSs being used. In General terms OODBMS is a Teenager, compared to the weathered and seasoned RDBMSs. People trust that proven technology behind RDBMSs and are wary of the Upstart Youngin' with its new approach. SQL is a simple means of communicating with the "Data Manager" (the MS of RDBMS), whereas OO equivilents are immature and people find it hard to use them to communicate their intentions to the "Data Manager".
The reason OO is adopted for the LHC is that these old proven methods reach their inate capacity to perform. A capacity which under normal situations is never really required. The LHC requires huge throughput and storage which RDMSSs can achieve, but are sub-par to OODBMSs past a certain point.
Under normal situations, every fiscal transactions of every human being, and an itemised inventory of every interpersonal communication during an instance of time; is insignificant compared to the Systemic demands of observing the unpredictable events of the Quantum realm.
Oh and MySQL is cheap ... like free beer. Everyone likes free beer.
I feel it often boils down to people thinking of the relational model being the de facto standard because that's what everyone's used for the past X years.
** Disclaimer: I'm in my 20's; the following comes from research notes and not experience, so may be inaccurate.*
It seems that in the early days of databases, around the 1950/60s network, object, and relational databases all had an equal footing. Relational won out simply due to marketing (thanks to IBM and a few direct competitors) and since then has had all the focus, and funds, pushed in that direction leading to the big advances seen over the other models.
Then during the 70s we saw the emergence of object databases in research before being launched into the mainstream mid 80s, hitting their stride in the mid 90s at around the same time IBM brought out lotus notes server (later lotus domino) which brought into the mainstream a new type of database: the document db. This, in conjunction with lotus notes, did very well. As did the Object databases. For a while, anyway.
Then the marketing guys started up again. Lots of noise about moving how moving legacy relational systems was easier to move to new RDBMSs rather than other models (even though the stuff talking to the dbs was OO - this didn't matter).
And that's where we are today. People use RDBMSs because... people use RDBMSs.
If people were to start using the other data persistence models and push for the features they need rather than complain and move back to RDBMSs we might see a similar growth of mature, stable platforms. We're seeing some steps towards this now with some of the newer OpenSource projects, but its not enough.
People often forget is that the data they're storing often doesn't sit well in an RDBMS, but they'll do anything they can to fudge it into one. We often see cases where we need to work with a lot of semi-structured, loosely-related data. We need to store it, index it, and search it quickly. Currently there are few, if any (none come to mind), solutions to this. So we grunt, switch our mind to relational mode, and start to draft our schemas as best we can, knowing that we're imposing a structure that a few months down the line won't bare any resemblance to the data we started with.
CouchDB has started down a path with a solution for this. It stores semi-structured data in documents and allowa you to create views to retrieve this data with its relationships.
The best thing to do is to look at your data, work out how you're going to use that data, the look at what would be the best way to store it.
So, when working with:
lots of numbers (eg. stock/share/pricing information) and need to run aggregated sums/reports on this data, stick with relational databases.
a huge pile of tuples and you're doing simple look-ups, a key/value-db is unbeatable.
a "real-world" model with which you need persistence, look no further than an OODB.
"small" bits of related/meta data around a key element, such as contact/address-book information or a product catalog, then a document database is probably the way to go.
working with a lot of content/copy/words (eg. web pages, blog posts, technical documents which reference other tech docs), specifically in the "web" arena, you can't really beat a good XML db (specifically, for inter-relating documents, the XInclude spec for linking document fragments does a wonderful job of allowing updates to referenced data).
Whats also worth bearing in mind is how you're going to work with that data. As with most things with development, picking X or Y because they're cool doesn't mean they work well. XML gets a bad reputation because people don't work with it in the right way. It works best when its managed along-side its other related technologies: XQuery & XPath for querying, XSL for translation/templating, XInclude for linking and merging data fragments, and XUpdate for manipulation. Wrap all this in a native xml database and you've got a very nice environment for working with XML.
I used db4o for an experimental website forum application. It wasn't a commercial project but was designed from the start to deal with huge forums - 1 million plus posts. I realised people wouldn't want to migrate their database to another rdbms after 1 million or more posts.
Sadly db4o couldn't manage this amount of string data floating around, and took around 10 seconds to query. It may have improved since then, but my general feeling with db4o, sql lite and others is they are designed for embedded scenarios, or small sets of data.
For example phone applications, mocking and < 500,00 rows apps that don't use large blogs or strings.
Having said this db4o outperforms Hibernate with mySql and others.
Here's forum thread about my issue
Simon Munro:
- They apply very well to niche situations. Those are generally caching type applications and complex objects for smaller sets of data (think 911 dispatcher type application)
- They do not handle schema changes well at all. You pretty much have to write code to mutate each object. Imagine writing objects that serialize to disk as binaries - if you change your class you lose the ability to load old files. Object databases have the same problem
- You can't 'report' off object databases in a traditional way. Every report has to be a program that loads the objects and does something with them so there is no concept of 'list all where...'
Point 1 is valid, but the other two are not (at least not in general).
Ad 2: First SQL handles schema changes badly (typed columns, have to update both client program and server). The other part about not being able to load old instances once you modify their class is simply not true. For example in Common Lisp CLOS OODBMS you can add slots at will and easily implement deletion and change protocols to load old classes.
Ad 3: Having a standardized query language is pretty much orthogonal to whether the underlying system is relational or not.
I think nobody has mentioned the object-relational databases (i.e. relational databases with object-oriented extensions). In fact, most of the DBMS vendors (e.g. Oracle) nowadays are object-relational. In Oracle, you can combine in the same database, relational tables and object tables (tables where each row is an object of a given type). Object types (i.e. classes) can have inheritance relationships, methods,...
In my opinion, this is the best option: take the best of both worlds.
With this database benchmark software (GNU GPL) you can test many famous different databases. I think it's suitable to get some info about different database performance. Access to data in OODBMS can be faster because joins are often not needed (as in a tabular implementation of a relational database). This is because an object can be retrieved directly without a search, by following pointers. (It could, however, be argued that "joining" is a higher-level abstraction of pointer following.) OODBMS are faster than relational DBMS because data isn’t stored in relational rows and columns but as objects. Objects have a many to many relationship and are accessed by the use of pointers. Pointers are linked to objects to establish relationships. Another benefit of OODBMS is that it can be programmed with small procedural differences without affecting the entire system. This is most helpful for those organizations that have data relationships that aren’t entirely clear or need to change these relations to satisfy the new business requirements. This ability to change relationships leads to another benefit which is that relational DBMS can’t handle complex data models while OODBMS can ....