views:

250

answers:

8

In code, it's generally pretty easy to add new classes to provide additional functionality and such. I have a fairly good understanding of refactoring code and what's involved so YAGNI generally makes sense to me.

What I'm not as familiar with is working with and updating a relational database once it's deployed. I'm developing a little pet project that I'm planning on practicing Release Early, Release Often on and I'm wondering if I should be considering data that won't be used in the initial release, but is on the planned features list? Is it as easy to add tables and tweak schemas around as it is to add new classes? Or should I try to have tables set up for things I could conceivably use, but aren't planning to in the immediate future?

+2  A: 

My opinion is that YAGNI applies to everything, coding, database design, work around the house (my wife disagrees with me vehemently on this one), and so on.

Every DBMS-based application I've ever worked on has had updates regularly to the scema so this should be planned for in processes. The DBAs won't like the "release often" part of your proposal since that's more work for them (or you if it's a non-DBA database).

But that's what they're there for.

paxdiablo
+3  A: 

This is an excellent question. I've personally found that modifying a database schema and converting all the data to the new representation to be far more difficult than refactoring code. So much so, in fact, that whenever I start a project that will use a new database I always take time before I sit down and write any code to get my spec as thorough as possible. What this often entails is anticipating features and incorporating support for them into the database, even if I don't plan on immediately implementing them.

You might be able to be a bit more nimble with database refactoring if you're using a framework or other similar layer that provides a mechanism for altering the database, but if you're writing straight SQL then I would recommend investing a modest amount of time designing and implementing a schema that is less likely to require changing in the future.

Kyle Cronin
It's true that refactoring databases is more difficult. On the other hand, I've also found that working with inflexible databases designed up front becomes a huge timesink. The database doesn't properly reflect the requirements, or is far more complex than necessary, so accessing it becomes a pain.
jalf
True. I was basically assuming that the database designer and the programmer were the same person, and that person is capable of designing a good db spec. I'll update my answer to make that clearer.
Kyle Cronin
+2  A: 

If you have good testing that hits the database, I would extend YAGNI to your database design.

In general, it is easy to add columns and tables, and less easy to remove or modify them significantly. Take that into consideration when you design tables (i.e. if a customer can have multiple users, don't add userid to your customers table. Do it right the first time).

Kai
What if you flip that around: say the client's specifications are currently just for one artist per tourdate, but in the future two of their bands could go on tour together, thus play at the same shows. Do you still add artist_id to the tourdates table, limiting the system to 1 artist per tourdate? Or do you think ahead, making it a HaBtM relationship?
Calvin
A: 

The principle still applies. You are not going to have to worry about adding fields to tables etc until you have a lot of data. Quite a lot of data. So worrying too much too early about the details of indexing and query plans without seeing actual data is going to often be wasted time and even lead to architectural problems later.

Unfortunately, mucking about with the database design after a production release can be pretty scary if a good test/release process is not followed. So even more so than code, you want to get it right first time with a database.

With databases, you do want to plan for getting the data out as well as putting it in and storing it, so if your "planned" features include reporting, that will affect the design of the database a lot, so maybe plan for that.

It is not as easy to tweak a schema as adding a class, but it is doable.

So overall, YAGNI still applies.

Nat
A: 

Don't set up the tables you don't need yet. Part of the reason behind YAGNI is that you're not going to predict up front the right things that you will need. You can easily add new tables, alter existing tables, and so on when you need to change them.

A good framework should have some tools for performing migrations, which allow you to upgrade and downgrade your database easily and automatically. If you have this in place, and are reasonably careful about your design, you should be able to refactor your way to where you need to be, rather than trying to come up with everything you'll ever need up front.

Brian Campbell
+2  A: 

There is a whole host of agile thinking about database design and implementation. You might be interested in looking through the best practices at www.agiledata.org for some of Scott Ambler's thoughts on the subject. For myself, I generally let the design of the database grow as the application develops. I rarely create tables ahead of time. Exceptions to this would be things like auditing and permissions, things that cut across the entire design. I will think about how to implement these things even if I don't actually create any tables for them upfront. Cross-cutting aspects do affect the design of tables even if those features aren't always the first ones out of the gate.

tvanfosson
A: 

Database design is like any other kind of design. Your understanding grows incrementally, and with better understanding comes an evolving schema.

I wish it were easier to explain how there is, in fact, a conceptual basis for relational database schema design. The only tool that really models it is Object Role Modeling, which has been a long time emerging. The most familiar tool was Visiomodeler. To get a flavor of it, here are a couple links to Scott Ambler and Scot Becker But the conclusion one would draw is that object-modeling-type assertions lead directly to a specific relational logical model; hence the schema will need to change as your conceptual model changes.

In practice, your rdbms can handle quite a bit of flexing if you're really comfortable with transformation expressions; and it's worth it to get good at it.

Note: I think that SQL avoidance techniques like LINQ and Object-Relational Models will just be an impediment to an evolving design. I may be wrong. There's some reason to hope that Microsoft's Entity Framework will encompass Object Role Modeling; but I've only seen oblique references to the possibility.

le dorfier
+1  A: 

There is a conceptual basis for database design.

In classical database design, there are three models: conceptual, logical, and physical.

The conceptual model emerges from requirements analysis, and evolves as the underlying subject matter evolves or as understanding of the subject matter evolves. The conceptual model pins down elementary data as to form and semantics, but doesn't deal with such issues as table composition.

The logical model uses the relational model of data. It can be derived from the conceptual model, but it also deals with the composition of relations. Normalization and other composition issues come into play here. The logical model anticipates the table design, and also the queries and updates the application will make.

The physical model implements relations as tables, and also adds all the other features like indexes, tablespaces, etc. needed to actually build the database. It's derived from the logical model, but data volume, load, performance, and disk space all come into play.

This sounds lengthy and tedious, but it's actually fast, if you know how to do it. The whole thing can be done in a matter of weeks, while the rest of the team is still debating functional specs. For a really small project (6 tables, 50 columns) it can be done in days, with just pencil and paper. For larger projects, there are tools that make design more automatic, less error prone, and easily diagrammed.

But what happens when you discover that the conceptual model was inaccurate or incomplete, and the other two models and the database itself need to be altered? That's where Data Independence comes to the rescue. Data independence does for database design what encapsulation does for object design. Namely, it prevents a minor adjustment in one place from propagating all over the application objects. Objects are only dependent on the data that they use.

If a new table has to be added to a schema, the chances that any application will be broken is vanishingly small. Even if an existing table has to be altered, queries that only use the old columns won't see any difference. Even when objects are dependent on the change, you can still give the altered table a new name, and then create a view with the old name that makes it look like the old table is still there.

And physical data independence is almost complete in a good DBMS. You can reorganize data, add and drop indexes, etc. and you shouldn't have to change any of the application.

In short, change management can be done brilliantly using the really good DBMS products out there. Unfortunately, a lot of the DBAs and programmers don't know how to make adequate use of these DBMS features, even though they have been around for years.

Walter Mitty