Relational databases - there has to be more right?

views:

1215

answers:

+12 Q:

Relational databases - there has to be more right?

I really enjoy database design and the whole concept of managing data semantically and all the logic that comes with it.

My knowledge level when it comes to databases is however (I would guess) quite basic - I can model data relationships correctly with ER diagrams, connection tables, handling many-to-many, one-to-many etc etc. I'm experienced when it comes to programming in general, and I figure my database knowledge is like knowing the basics of object oriented programming, i.e how to model a car-class, inheriting from the vehicle class, containing wheel objects and so on.

Now I would like to further my knowledge about relational databases so that I may confidently say to an employer that I can handle the subject on a professional level.

All I can handle right now is probably my movie database in the back end of my personal website, which probably would collapse if I was Amazon and had to store millions of movies. So there's the subject of scalability right? I'm sure there's a pretty "standard" array of subjects/concepts within database design that you simply must understand and be able to apply in real life if you're going to work with databases on a professional level.

So, I would be very grateful if any database gurus in the field could namedrop some areas, concepts, case studies or anything that would be beneficial to study to get really good at databases. I'm sure there's a vast science lurking here, and I want it.

Thanks in advance!

+8 A:

The standard text in the field is "An Introduction to Database Systems", by C. J. Date.

I have twenty years C experience; I read it, thought it excellent and I wrote a relational database because of it (a proper one, not this SQL malarky!).

Blank Xavier 2009-04-15 11:26:22

Out of interest: What's a "proper" relational database?

Tomalak 2009-04-15 11:28:40

One that is fully and truely relational. SQL isn't.

Blank Xavier 2009-04-15 11:31:21

Oh, I see. Is your project public, can I download it somewhere?

Tomalak 2009-04-15 11:39:23

Thanks. I'm intrigued. Hope the book is as archetypical as you make it sound.

sharkin 2009-04-15 11:41:27

Having read Date's "Database in Depth", I too am implementing a proper relational database.

Pete Kirkham 2009-04-15 11:48:01

In what way is SQL not "truly relational", just out of interest? [I just love assertions without arguments to back them up :-) ].

paxdiablo 2009-04-15 11:58:00

@Tomalak - sorry, no. I wrote it to write it; it works, but there's a lot of work going from "it works" to "you can make it public".

Blank Xavier 2009-04-15 12:04:47

@Pax - that's an excellent and absolutely vital question! fortunately, a guy called C.J.Date wrote a book *all* about this - you should read it! :-)

Blank Xavier 2009-04-15 12:05:21

@Blank Xavier: I thought so. This leads me to my other question - since SQL doesn't cut if for you, am I safe to assume that you devised a whole new query language as well?

Tomalak 2009-04-15 12:11:05

I've read it, @Blank, but I agree with Codd if you're talking about the NULL issue in particular. There's are situations when you don't know everything about a row and no easier way I know of to do it other than with NULLs. If it's something else, let us know. Not arguing, just genuinely interested.

paxdiablo 2009-04-15 12:15:47

My Date is in a box in the shed, I shall have to dig it out and have another read.

paxdiablo 2009-04-15 12:16:49

@Pax, perhaps you want to start a new question to get answers on this.

bignose 2009-04-29 13:15:32

@Pax just to give one example: SQL treats the order of columns as significant. Relational theory says you shouldn't.

JoeCool 2009-06-16 13:30:59

+1 A:

Well it's always good designing examples... See if there's anyone you know who needs a database for something. But studying VLDB (Very Large DataBase) techniques might be useful depending on the industry you're interested in.

Ian 2009-04-15 11:26:24

+1 A:

I believe optimization on existing databases could be interesting to dwelve into. I.e. why you should denormalize tables.

Some basic relation algebra is useful knowledge and closely related to set theory.

Spoike 2009-04-24 07:33:47

+4 A:

A whole other area is dimensional modelling and data warehousing.

I had been working with relational modelling for years, and then I read The Data Warehouse Toolkit and received an entirely new view of how it could be used.

Andrew Shepherd 2009-04-24 07:53:43

+1 A:

A very common scenario is having to map ugly databases to an entity model which is not necessary reflected directly in the structure of the DB. Working out which way is best to model the data in your domain can be tricky.

Full text search and XML are subjects that seem to be coming up more and more.

I have no experience with it but I know DB2 (of which there is a trial version) has some crazy new features)

Have fun :-)

Karl 2009-04-24 08:04:25

+2 A:

Get more dirt from C. J. Date's Database In Depth: Relational Theory for Practitioners if his An Introduction to Database Systems isn't grubby enough for you.

Seriously, these two books will give you a great deal more knowledge about RDBMSes, in a great deal less space, than many other professional database workers possess. Database in Depth, in particular, looks at how to think about databases relationally even when the language doesn't support it, and how to trick SQL into being a close-to-relational language.

bignose 2009-04-27 08:11:45

I highly recommend to start with www.dbdebunk.com. It has a lot of practical stuff in oppose to theory. The site is a little outdated, but still useful. Even commercial content isn't too expensive, if you really like to become database professional.

Thevs 2009-04-27 18:31:51

i would advise narrowing your scope a little. pick an sql server and become expert at it ... for example get mysql, learn the difference between the storage types, the replication types, etc. implement replication in a couple different ways. get a large dataset and try to optimize queries. do some pivots and optimize your indices for that. investigate backup strategies. see how to increase performance in replication and backups when you have a 10gb database that consistently adds 100,000 transactions every day. write software to insert the records and scripts to do replication and backups.

its hard to become an effective dba with no real experience when you try to cover all sql servers. just focus on one ... i'd suggest mysql or mssql, but whatever floats your boat.

-don

Don Dickinson 2009-04-27 20:21:27

+3 A:

I'll volunteer a list of areas that you might want to consider as aspects of programming with databases. I would not claim that you need to be expert at all of them, or even most of them, in order to be able to program using a DBMS, nor even to program a DBMS. However, they are all topics that are of some relevance at some times - in no particular order:

Query language design
Query optimization
Query rewriting
Data types
Storage organization
Transaction management
Communications protocols
Encryption
Authentication and identification
Schema design
Replication
Backup and restore
Two-phase commits
Optimistic concurrency control
Locking and pessimistic concurrency control
Authorization
Label-based access control
Set theory
Relational theory
Distributed query
Boolean logic
User-defined types and functions
Catalog management
Buffer management
Sorting
Internationalization (I18N), Localization (L10N), Globalization (G11N)
Quantifiers
Auditing
Triggers
Stored procedures

I make no claims of completeness or minimality, either.

Jonathan Leffler 2009-04-28 03:25:26

I see no points on optimization of the database itself, are there different kinds? which ones are the typical forms of storage/design optimization?

sharkin 2009-04-30 14:52:01

It depends on what you mean by 'optimization of the database'. I did mention 'storage organization', which your second sentence suggest may be what you are thinking of. There are a number of different storage organizations. One of the basic, but very effective, ones is using B-Tree or B*-Tree organizations; you will also find hash widely used. There are more specialized organizations too, such as index-only tables, or bit-map indexes, or R-Tree indexes (for spatial data). As a general purpose structure, the B-Tree variants take quite a lot of beating.

Jonathan Leffler 2009-04-30 15:01:20

Ok. Many thanks!

sharkin 2009-05-01 15:12:59

+1 A:

http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html

http://blogs.msdn.com/pathelland/archive/2007/07/23/normalization-is-for-sissies.aspx

http://www.25hoursaday.com/weblog/2007/08/03/WhenNotToNormalizeYourSQLDatabase.aspx

http://itc.conversationsnetwork.org/shows/detail571.html

http://www.scribd.com/doc/2592098/DVPmysqlucFederation-at-Flickr-Doing-Billions-of-Queries-Per-Day

http://highscalability.com/flickr-architecture

http://highscalability.com/ebay-architecture

http://www.addsimplicity.com/downloads/eBaySDForum2006-11-29.pdf

http://blog.maxindelicato.com/2008/12/scalability-strategies-primer-database-sharding.html

http://blog.maxindelicato.com/2008/12/the-ihsdf-theorem-a-proposed-theorem-for-the-tradeoffs-in-horizontally-scalable-systems.html

http://www.iamcal.com/talks/

http://natishalom.typepad.com/nati_shaloms_blog/2009/04/writing-your-own-scalable-twitter.html

http://www.mysqlperformanceblog.com/

http://www.cecs.uci.edu/~papers/ipdps07/pdfs/SMTPS-201-paper-1.pdf

http://video.google.com/videoplay?docid=7278544055668715642

http://www.niallkennedy.com/blog/uploads/flickr_php.pdf

zalew 2009-04-28 04:22:46

+2 A:

being just a student of Databases I can only speak from my limited scope but I can suggest two sites that may help...

http://database-programmer.blogspot.com/2008/09/comprehensive-table-of-contents.html

This is Kenneth Downs site, he goes from the very basics if SQL and delves into more complex subjects. The man has written a framework around DB's after all.

Another one is High Scalability...

http://highscalability.com/

They get into every realm of DB's.

Hope this helps.

Randin 2009-04-28 05:12:28

+1 A:

It depends on what you want to do with your databases, how does your data look, what are your work flows, how many servers, clients and databases you have to work with ...

So let's pretend that you, like me, have to deal with several databases, not to big (< 100 GB each) and you have many clients with many different needs that make you develop many custom solutions, like producing custom reports or exports. That makes you more of a programmer than a DBA. And what you need is productivity, before performance.

The best solution I came up with, in that situation, is getting rid of SQL, as much as possible. You can achieve that by using some kind of ORM, either home made or existing ones, and thus trading SQL scripting for object programming. Doing that I do in minutes what would takes hours with SQL.

dugres 2009-04-28 16:09:16

+1 A:

I think set + relational algebra is a thing that most database users know little about but would do well to learn. When you appreciate the logic involved behind mapping one relation to another you start to see more clearly why things like normalization are good, why NULLs are best avoided if possible, etc. You also start to see the flaws in SQL compared to purer relational query languages, where features impose limitations on the paradigm due to performance reasons, etc.

Kylotan 2009-04-28 23:50:05

+1 A:

Disclaimer: not an expert in database design.

Some of the performance issues can be handled either by:

denormalizing your database, so to reduce the number of tables to join
adding indexes
filtering should be done so that you first remove the largest of the non matching data, then you cherry pick the next condition on the reduced set. It's better to go from 100 values -> 10 matching first condition -> 1 matching first and second condition than 100 values -> 80 matching second condition -> 1 matching first and second condition. Seems trivial, but it's important to keep in mind.
divide et impera is the motto for scalability. If you have something that scales in a non-linear way, say O(N^2) it makes sense to keep N as low as you can, and you should partition your data set into smaller sets, assuming they are independent and you can work out the partitioning. An example of this is sharding, typically used to keep databases of users in large social websites. (NB: an example, I would not implement it this way) Instead of having a huge database with all the users, they have 26 servers (one for each letter of the alphabet), then they put all the nicknames with the same first letter in the same server. This has the following advantages:

a. you balance the load on different machines
b. if one machine crashes, you make the site unaccessible only to a subset of your users, not to all of them
c. you preselect the search with a highly discriminating criterium (the first letter), then perform the second search (the username)
d. you reduce the number of entries each database has.

Stefano Borini 2009-04-29 00:13:35

+1 A:

There is only one rigorous technique for conceptually modeling a relational database schema that I know of (and I've spent a lot of time looking). It's confusingly named "Object-Role Modeling". Here are a couple references.

http://www.agilemodeling.com/artifacts/ormDiagram.htm

http://www.tdan.com/view-articles/5033

http://en.wikipedia.org/wiki/Object_role_modeling

http://en.wikipedia.org/wiki/NORMA

and here's a plugin for Visual Studio

le dorfier 2009-04-29 00:24:45

+1 A:

Don't forget representing hierarchy and/or graphs in databases. It can be a pain & there's no right answer.

The standard techniques (for hierarchy at least) have been mentioned in these SO posts:

edit: There's also spatial database applications for GIS use, where you have data structures and/or indices based on point locations using R-trees and the like. Using these is a little bit different than the regular non-spatial database features.

Jason S 2009-04-29 00:36:07

+1 A:

To my mind there are three "tracks" with database skills: Developer, DBA and Architect. From a development perspective you want to focus on development, understand Architect and pick up as much DBA stuff as you need along the way.

As a developer the key thing (to my mind) would be to get your SQL to a really good standard. As an interviewer if I'm looking for a developer I don't care if you can design databases as much as how you can write queries. Assuming you know about your basic CRUD commands, do you know about:

Stored Procedures (not just how to use them but when and why)
Views (ditto, including materialised views)
Triggers (insert, update, delete, how and why)
Cursors (especially impact on performance)
Referential integrity
Transactions
Indexes
Adding defaults, constraints and identities to tables
Complex use of group by and having
Functions especially:
- Date and time manipulation
- String manipulation
- Handling nulls

You should be able to pull any data you need from your database using SQL alone, you should never need to manipulate or parse it in any way using your procedural code (you may choose to but it will be a choice rather than you didn't know how to do it with SQL).

As a developer the one booking I'd look at is Joe Celko's SQL for Smarties. Lots of SQL to do things you may never have really thought about being able to do in SQL.

One of the best ways to learn this stuff is, tedious as it seems, writing reports (management information). I've seen so many people moan about writing reports being tedious and then do it really, really badly (and not just because they didn't try). Reports tend to be close to pure SQL so you have to really get to know the tools at hand and a complex report really exposes those who know SQL from those who don't. People also tend not to want to wait too long for them so performance is key too.

Look at your current database and come up with a bunch of really really awkward things someone might actually want to know. Think marketing, trends, most and least popular. Then try and combine a bunch of them into one query.

In terms of performance I'd also be trying to get inside how the query optimizer works, how it makes decisions about when to use an index and when to table scan, when indexes will help and when they'll hinder.

A good developer doesn't just write good queries, they write quick, maintainable queries. To really get to grips with this you'll need to play around with a database with a dozen (or more tables) containing, ideally, millions of rows. That's when you start seeing queries you thought were fine dragging their heels.

The architect/designer stuff others have covered pretty well. All I'd say on the subject is that for every database that has to be designed there are hundreds of queries that need to be written for it. You might want to consider that proportional break down of work when you're upskilling and make sure your querying is really up to scratch.

In terms of links it depends on the platform - all this stuff tends to be platform specific. But then that's what google is for.

Not I suspect entirely what you want but worth knowing as a lot of people who think they know SQL really really don't...

Jon Hopkins 2009-04-29 20:38:09

Well, to be frankly a database is just a way of storing and accessing data. Pretty much what a file system does too.

A parallel from LDAP is that it is a protocol and by this not a definition of what you can do with it and how it should be implemented, the same could be said about SQL.

So if you want to know more about databases you are actually saying you want to know more about the SQL protocol and/or how to store and fetch data.

You might be interested in searching around for what a 'B-Tree' is and how it is used. Another thing worth looking up is EAV (Entity-Attribute-Value) and why schema's are so important for it.

With that knowledge you could actually role out your own DB and while doing that appreciating what RDBM's already have done for you.

If you want a more practical approach look at the documentation that the open-source PostgreSQL provides, perhaps starting with this.

Martin P. Hellwig 2009-04-30 12:41:09

You could start by reading one of the (almost recent) review papers that focuses on the foundations and trends in Databases: The anatomy of databases

amit-agrawal 2009-05-05 05:46:50

ansaurus

tags:

views:

answers:

Relational databases - there has to be more right?

related questions