One of the most important opportunity TDD gives us, from my point of view, is to develop projects incrementally, adding features one by one, which means ideally we have working system at every point in time.
What I am asking is, when the project involves working with a database, can we use this incremental approach for creating database structure or should we work the structure out before start writing code? I know it's hard to predict what the structure of database will be like in 1 year from now, but generally, what's the best practice on it?
views:
196answers:
6The answer here is fairly obvious really, as far as I'm concerned.
You design the database structure. TDD, to a degree, isn't about testing logic (logic in the head) it's about testing implementation, and making sure it stays consistent.
Designing a DB, as with designing anything, is about getting it correct logically and conceptually. I.e. making sure you have the right fields, that the table will be useful, that it ensures and implies the right sort of relationships, and that it allows all the sorts of actions that you wish.
So, before you write any code you need to have this "thing", to know what your code will do. Thus, it follows trivially that you make the DB first, and then write code to test it.
Perhaps it will be shown, via testing, that you forgot something. Okay, this is good an appropriate; so go back and add it, and then continue testing.
The benefit of TDD and YAGNI is that it explicitly adresses the issue that we, as developers, can't predict future requirements. That is just as true for relational database design as it is for object-oriented code.
The database is an implementation detail. Its only purpose is to support the application by providing persistence services. If you don't know what your code is going to do three months from now, it would be illusory to think that you know what your database is going to look like.
For me, this is a question with a "theoretical" answer and a "real world" answer.
In theory, you add a column as and when you need it, and you refactor your database as you go, because that's agile.
In the real world, your DBAs will kill you if they have to rebuild your test data every five minutes because you've changed the schema again. And in a smaller project, you'll get personally sick of having to spend half your time maintaining an unstable database.
As skaffman alluded to in a comment: database maintenance is generally more expensive than code maintenance. This is doubly true for rollout: you can roll an entire new application without a hitch, but try planning a live database upgrade without breaking your data.
It's a difficult discussion, because agile purists will insist that everything should be done "just in time." But, as in most things agile, the reality is that someone needs to be looking ahead of the next release. Priorities do change, but if there's not at least a vague idea of what the product will look like in 6 months then you've got bigger problems than development methodology...
The role of an architect (or tech lead, or chief DBA, or whatever flavour you have) is to be looking ahead those few months and planning for what you are 90% sure is coming, and part of that will be defining the data you're going to need and where it's likely to live.
So, perhaps instead of adding a column at a time, add a table at a time. Find the balance that suits your project and your development process, without doubling your workload.
If your tables are in Boyce-Codd Normal Form or better, then they should be quite easily used by any application without modification, assuming they actually store the data needed. The whole point of relational databases and relational modeling is to develop a data model independent of any application's search paths or commonly used queries.
And it is quite easy to design a properly normalized database "up front," at least if you know what the data being managed up front is.
The only reason you would need to "refactor" an RDBMS schema is if the original design was prima facie unacceptable to any competent eye. Now, some tablespaces or indexing might need to be tweaked, but that has nothing to do with the design.
can we use this incremental approach for creating database structure or should we work the structure out before start writing code?
Yes you can (have a look at Fowler's Evolutionary Database Design). And no you shouldn't work the structure up front (this is BDUF). Scott Ambler has also written a lot on this and on the techniques that allow to apply it in real like. Chek out Agile Database Techniques, Refactoring Databases: Evolutionary Database Design and The Process of Database Refactoring: Strategies for Improving Database Quality for example.
And as I said in a comment, if your DBA doesn't like (if he acts with the model and data like Gollum with the precious), get another DBA, a DBA that understand the work of Fowler and Ambler. Period.
Several approaches may be taken to reduce the difficulty of refactoring the database to match the code that TDD generates. Consider either generating your database from the classes you create as part of the TDD process.
Another possibility is to generate your database, test data, and possible even the basic repository code, from a conceptual database model using a tool like NORMA. The "ORM" here is Object-Role Modeling (the "other" ORM), and NORMA is a Visual Studio add-in that can generate DDL and code from a conceptual model.
The nice thing is, even if the conceptual model changes significantly (a relation becoming many-to-many, for instance), then both the code and DDL will change to reflect that.