views:

196

answers:

6

One of the most important opportunity TDD gives us, from my point of view, is to develop projects incrementally, adding features one by one, which means ideally we have working system at every point in time.
What I am asking is, when the project involves working with a database, can we use this incremental approach for creating database structure or should we work the structure out before start writing code? I know it's hard to predict what the structure of database will be like in 1 year from now, but generally, what's the best practice on it?

+1  A: 

The answer here is fairly obvious really, as far as I'm concerned.

You design the database structure. TDD, to a degree, isn't about testing logic (logic in the head) it's about testing implementation, and making sure it stays consistent.

Designing a DB, as with designing anything, is about getting it correct logically and conceptually. I.e. making sure you have the right fields, that the table will be useful, that it ensures and implies the right sort of relationships, and that it allows all the sorts of actions that you wish.

So, before you write any code you need to have this "thing", to know what your code will do. Thus, it follows trivially that you make the DB first, and then write code to test it.

Perhaps it will be shown, via testing, that you forgot something. Okay, this is good an appropriate; so go back and add it, and then continue testing.

Noon Silk
The difference here is that refactoring a database is generally considerably harder than refactoring code
skaffman
@skaffman: Sometimes it may be harder. Sometimes not. I mean, clearly it's nonsense to design part of a table, then test it, then design the rest. It clearly makes sense to design and implement components of your system, in the DB, then make sure they all work, then add more. I'm not suggesting you make your DB perfect from Day 1 (it's of course unwise and not even necessary to strive for this, in general). I'm saying you complete a given model, then implement it. Repeat until the project is complete.
Noon Silk
+7  A: 

The benefit of TDD and YAGNI is that it explicitly adresses the issue that we, as developers, can't predict future requirements. That is just as true for relational database design as it is for object-oriented code.

The database is an implementation detail. Its only purpose is to support the application by providing persistence services. If you don't know what your code is going to do three months from now, it would be illusory to think that you know what your database is going to look like.

Mark Seemann
No - the purpose of a database is to store data, *that's* its primary purpose. Application interaction is secondary. Refactoring data storage is hard.
skaffman
It's fairly easy to predict future requirements. I don't see why you would want to perpetuate such a myth.
Noon Silk
Yes, refactoring data storage is hard, and if I could solve the problem with BDUF and get it right in one go, I'd prefer that. However, I've never been able to successfully do that, and haven't met anyone else who could.
Mark Seemann
Perhaps my assumptions are wrong, but my answer assumes a pull-based, Lean/Agile approach to software development overall. In such an environment, it's next to impossible to predict future requirements, because stakeholder priorities will change. That's not a myth; it's reality.
Mark Seemann
@Mark You're assuming you can't predict what the stakeholder priorities may be. Anyway, I think it's best if I don't talk with you on the matter, as I can *predict* that no fruitful results can come from such a discussion.
Noon Silk
@silky It's easy to predict future requirements!? Is this a joke? Our entire industry fails at predicting pseudo "known" requirement. Don't even talk about "future" requirement please. And even if you think you can, will you design your model up front to include low priorities things that will maybe not even happen? I won't. So indeed, I can predict that we won't agree and wish you a lot of fun with your waterfall style way of thinking.
Pascal Thivent
@Pascal Sometimes you do design a system so that it can handle future requirements. Typical example: In an online shop, you may handle "Fees" via a generic approach, even if the requirement (currently) is for one fee. It is easy to see that more fees may be added, and hence implement the solution generally, so it's easy to handle another one being added. This is something most of us would do without thought. It's cute, currently, to throw around terms like "YAGNI" (and others) but really, it's not accurate. We do predict the future. Mostly trivially. Some changes are unexpected, of course.
Noon Silk
@Pascal but generally, systems can be designed with a degree of flexibility. (You just updated to suggest I have a "waterfall" style of thinking). Obviously, that's not true. Firstly, you don't know me, and to infer that you do from a single comment is a little strange. I find the implication weird, but you are welcome to it, I suppose. Nevertheless, my point remains, a logical person *can* predict what will occur, and can design appropriately. Obviously, this predictive capability doesn't mean that your development style is set it stone (that's a nonsensical conclusion).
Noon Silk
@silky First, I don't need to know you to claim that your comment sounds waterfallish and erroneous (Agile methods exist because most situations aren't predictable). Second, *"systems can be designed with a degree of flexibility"* is very different from being predictable as you wrote in your initial comment (myth, your word, is a very strong word, and is wrong). So my point remains, as a logical person, I believe in adaptability, not in predictability.
Pascal Thivent
@Pascal Believe in whatever you want, but don't go around accusing people of things you cannot be sure. Sadly, it appears obvious that with you also, there is no chance of reasonable discussion on this matter (at least via this comment box). It may surprise you to learn that people are different, some with different abilities than others. Just because you may not be able to do something, does not mean it is a general rule. Nevertheless, nothing either of us says will change the other, so there is no purpose to this. I leave you to your assumptions.
Noon Silk
+6  A: 

For me, this is a question with a "theoretical" answer and a "real world" answer.

In theory, you add a column as and when you need it, and you refactor your database as you go, because that's agile.

In the real world, your DBAs will kill you if they have to rebuild your test data every five minutes because you've changed the schema again. And in a smaller project, you'll get personally sick of having to spend half your time maintaining an unstable database.

As skaffman alluded to in a comment: database maintenance is generally more expensive than code maintenance. This is doubly true for rollout: you can roll an entire new application without a hitch, but try planning a live database upgrade without breaking your data.

It's a difficult discussion, because agile purists will insist that everything should be done "just in time." But, as in most things agile, the reality is that someone needs to be looking ahead of the next release. Priorities do change, but if there's not at least a vague idea of what the product will look like in 6 months then you've got bigger problems than development methodology...

The role of an architect (or tech lead, or chief DBA, or whatever flavour you have) is to be looking ahead those few months and planning for what you are 90% sure is coming, and part of that will be defining the data you're going to need and where it's likely to live.

So, perhaps instead of adding a column at a time, add a table at a time. Find the balance that suits your project and your development process, without doubling your workload.

Dan Puzey
*In the real world, your DBAs will kill you if they have to rebuild your test data every five minutes because you've changed the schema again* I would fire such a DBA and get a real Agile DBA (or no DBA at all for a small project).
Pascal Thivent
Well, yes - on a small project, you don't need a DBA. On a large production system with millions or billions of rows, doing constant maintenance because the developers can't fix their schema for more than a fortnight is liable to cause frustration.My intention was to point out that the data migration required if you're changing schemas often is a real pain, and the cost/benefit of that gets much worse the larger the project is.
Dan Puzey
First, what you are describing in your answer is BDUF, that's just anti Agile. And it doesn't have to happen, even on a big project (I did it on a big financial project). Second, the team should have the control, not an architect, a tech lead, or a chief DBA. There is no such thing in Agile. The team decides.
Pascal Thivent
Hence my comment about Agile purists! What we see as "ideal" development practice is simply too expensive and impractical in some situations. Yes, ideally we'd do Lean and refactor constantly... but realistically, the business will expect and need you to be closer to "right" first time. And "the team decides" doesn't always work in Enterprise-size projects - because what you end up with is ten people doing a job that one could do in half the time. It's not "anti-agile," it's just practical truth.
Dan Puzey
The Database Agile purists, in my experience, undersell the complexity, downtime, and downstream costs of extensive database refactoring on large production systems. The database layout is, in many cases, the interface, and one has to tread judiciously when refactoring. When laying out a design, I ask the business analyst very clearly about the cardinality of the relationships -- Can you ever imagine a case in which this relationship between entities is every optional? Ever multiple? It's not DBUF, it's Big Requirements Up Front, and that's the cheapest time to gather them.
Adam Musch
Glad my comment inspired a bit of discussion, anyway :-) The challenge I always see with agile is fitting the theory of it to the practicality of a business, and I don't think that's something that's anywhere near solved yet.
Dan Puzey
Agile doesn't fit everywhere, especially not in already too big (too rigid, too much paperwork) companies, no doubts about that.
Pascal Thivent
I don't think it's about agile "not fitting," and I don't think it's anything to do with paperwork either. I do think your comment is the perfect example of the agile purism I mentioned though ;-)
Dan Puzey
@Dan It's your right to disagree. But, believe it or not, Agile organizations and businesses do exist and it's a pleasure to work in such environments. I wish you to taste it one day.
Pascal Thivent
Pascal: If you've ever wondered why some people ever refer to the Church of Agile, examine your own comments for their level of sanctimoniousness, and you won't have to wonder any longer. Agile is a technique, a toolset, a philosophy; your belief in its universal applicability (ever work in a software company where you have to ship a product not yet built by some date? Agile can't save you there) and scorn for those who have yet to adopt it work against Agile's uptake.
Adam Musch
Dan Puzey
Also @Pascal, to answer one of your comments specifically: I don't think agile businesses don't exist, I just think they're significantly rarer than traditional businesses with agile development teams. I would love to have the pleasure of working with one, but I find them few and far between. Maybe you're luckier than I, but my role is more about making an agile approach work for the business than the other way around.
Dan Puzey
@Adam Where did I say Agile was universally applicable? I explicitly wrote *Agile doesn't fit everywhere*. I can't be clearer. Now, I do indeed believe in Agile, in Lean, in "fast companies", in self-organized teams, in leadership vs command and control etc and I do indeed prefer to work in such environments. You don't, fine. Just don't make me say what I didn't. Thanks (and I'm done now).
Pascal Thivent
@Dan Yes, I do ultimately believe that Agile (as an organizational pattern for lean businesses) should change an organization and that the other way around is suboptimal. My job as a facilitator is to conduct change and to remove impediments to make it occur. If impediments can't be removed, at least people know why it's suboptimal. If this makes me a purist, then I'm proud of it, and it's my job.
Pascal Thivent
+3  A: 

If your tables are in Boyce-Codd Normal Form or better, then they should be quite easily used by any application without modification, assuming they actually store the data needed. The whole point of relational databases and relational modeling is to develop a data model independent of any application's search paths or commonly used queries.

And it is quite easy to design a properly normalized database "up front," at least if you know what the data being managed up front is.

The only reason you would need to "refactor" an RDBMS schema is if the original design was prima facie unacceptable to any competent eye. Now, some tablespaces or indexing might need to be tweaked, but that has nothing to do with the design.

Steven Huwig
Until the properly normalized BCNF database doesn't perform under load. All the tablespace and index tweaking in the world can't save you in some cases. The volume of data, its distribution, and its use is a dimension of proper design as well.
Adam Musch
I should have added materialized updateable views, of course. At the point you describe, there's no point in using a conventional RDBMS at all, so the question of how to evolve its schema is moot.It's also not a point that's easy to reach, unless you're doing bioinformatics or Facebook...
Steven Huwig
Yes, because the $CURRENCY cost for hardware -- and the cost of licensing an RDBMS -- is also not a consideration. Just throw money/hardware at it until it performs better, because budgets are infinite.
Adam Musch
Yes, because the $CURRENCY cost of inconsistent data -- and the cost of catering to incompetents -- is also not a consideration. Just put everything into one denormalized table and blame the DBMS engine, because excuses are free.
Steven Huwig
Normalization isn't just about not repeating yourself; it's also a space-time tradeoff. It's a tool or technique, and understanding design also means understanding when the traditional design isn't the right one. There are cases where trading space for time is prudent -- those cases tend to exist where there are strict response time SLAs and/or processing batch windows that must be met. When you've had to tune a system striped across nine mainframes processing billions of dollars in transactions per day, you'll understand just how foolish a comment like "BCNF, silly architect!" sounds.
Adam Musch
I have worked on those systems. I think an architect would realize that transaction volume isn't measured in dollars, though.
Steven Huwig
@Steven Huwig: Billions of dollars in transactions per day -> the aggregate value of the transactions being processed. Surely you understood that. Metacriticsm of sentence structure of comments limited to 600 characters is weak sauce.
Adam Musch
Irrelevant statistics cited in an argument against a strawman is weak sauce too.
Steven Huwig
If BCNF were the 42 of database normalization, no other normalization rules or practices would need to exist. I have experienced cases where BCNF did not scale to the requirements of the business. Your challenge is now to prove conclusively that BCNF for relational database implementations meets every business requirement for every system regardless of any external constraint, such as throughput, volume, or cost. "Prima facie unacceptable to any competent eye" *my* eye.
Adam Musch
You need to learn the difference between refactoring and optimization. A normalized database needs no refactoring. It may need optimization.
Steven Huwig
Introducing a calculated column is denormalization (violating 3nf, let alone BCNF), generally for performance optimzation, yet is listed by Ambler as a refactoring pattern/technique. http://www.agiledata.org/essays/databaseRefactoringCatalogStructural.html#IntroduceCalculatedColumn
Adam Musch
Ambler's not exactly the "definitive word" on refactoring, e.g. http://www.refactoring.com/ "Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior."Your example alters the external behavior by adding a column (thus is not a refactoring) and should be a materialized view (thus not requiring denormalization of the logical schema).
Steven Huwig
+5  A: 

can we use this incremental approach for creating database structure or should we work the structure out before start writing code?

Yes you can (have a look at Fowler's Evolutionary Database Design). And no you shouldn't work the structure up front (this is BDUF). Scott Ambler has also written a lot on this and on the techniques that allow to apply it in real like. Chek out Agile Database Techniques, Refactoring Databases: Evolutionary Database Design and The Process of Database Refactoring: Strategies for Improving Database Quality for example.

And as I said in a comment, if your DBA doesn't like (if he acts with the model and data like Gollum with the precious), get another DBA, a DBA that understand the work of Fowler and Ambler. Period.

Pascal Thivent
I have. And those folks don't understand the difference between the cost of refactoring departmental-scale systems, versus, say, a credit card system with billions of records and zero downtime requirements. It's analogous to the difference between chemistry and chemical engineering. It's trivial to extend a normalized model; it's not trivial to remove columns in all database engines. In DB2 as of a few years ago, you'd have to drop the table, and even modern DB2 will require eventually taking the table offline. However, Fully normalized databases tend to low performance and high complexity
Adam Musch
Yeah, Fowler and Ambler are known for their misunderstanding of real life problems... And by the way, the question was about development, not a live system (and if your database isn't tolerant to change and doesn't support your methodology, well, what can I say except don't use it?).
Pascal Thivent
Databases, much more so than code, are live systems. Once you've done the first deployment, refactoring become much more complicated than changing the implementation behind the interfaces, because the implementation is the interface.
Adam Musch
A: 

Several approaches may be taken to reduce the difficulty of refactoring the database to match the code that TDD generates. Consider either generating your database from the classes you create as part of the TDD process.

Another possibility is to generate your database, test data, and possible even the basic repository code, from a conceptual database model using a tool like NORMA. The "ORM" here is Object-Role Modeling (the "other" ORM), and NORMA is a Visual Studio add-in that can generate DDL and code from a conceptual model.

The nice thing is, even if the conceptual model changes significantly (a relation becoming many-to-many, for instance), then both the code and DDL will change to reflect that.

John Saunders