views:

259

answers:

2

I'm interested in database refactoring. I deal with several databases that don't have a large amount of data, just a few GB with at most a few hundred thousand rows. However, they have hundreds -- sometimes many hundreds -- of tables, views, sprocs and functions. In some places a divide-and-rule strategy using schemas has been implemented which has helped some problems of seeing ownership/usage of tables. However, it hasn't really helped object coupling.

We all read that integration via shared database isn't A Good Thing, but we also know that it is, at least for a while , a very productive thing as everything is in the database. We just don't apply the Single Responsibility Principle to databases like we do to objects.

Edit: I should add that I have no database performance issues. The tables are not large, the biggest has only a few hundred thousand rows. There is no real database performance issue; except when the database schema/logic/implementation is grotesquely inefficient (say requiring a cursor to do a sproc execution for each row in a result set in order to pre-process data for a report). Before you say I should change these, that is the whole point: I can't because the database is no longer in a state where the impact of changes can be assessed.

Clearly at some point you say "Enough!" and divide into multiple databases connected by messages, ETL, application tiers etc etc

The question is: how many is too many? What is the absolute upper limit of the number of sprocs/tables/functions that you can have before you go insane?

A: 

I'm not sure there is a magical limit for any of the things you mentioned. I prefer to keep things in one place so I don't have to remember that some records are in place and other records are in another.

I'd be more interested to know if all this work is impacting your performance? And if it's not then why change it? Unless it's impacting performance in some horrible way your customers won't see any benefit from your work and then what's the point?

Your customers might be better served if you just bought a new machine or upgraded your database server software.

Jon
I don't have a performance issue in database terms. The only problem I face is technical debt. The database is not only complex but obscure with many fields that are no longer relevant.
No More Hacks
+1  A: 

First, stop trying to think of databases in object oriented terms. Principles of object oriented programming simply do NOT apply to relational databases.

Shared databases are a very good thing from a business perspective. Multiple databases storing information that has to be transferred between them quickly becomes way more complex than your piddly many hundreds of objects. Data that is consistent between enterprise applications is priceless. Trying to reconcile if GE Corp and General Electric Corporation are really the same entity between two databases can be a nightmare.

Refactoring datbases is a nice goal, but it is very complex in reality. Don't do it unless you have a major performance issue that needs to be addressed or unless you are willing to commit to a process of identifying all the code that might be affected by a change. Even then, consider if you can know all the code that might change (this is one reason why database people hate, hate, hate dynamic code!).

Often the best way to refactor is to add your change and start changing over to using your new field, sp etc while leaving the old one in place until a set expiration date. Since you are on an annual cycle, you will need to manage those dates over a long period of time. To see if sps are being used, you can identify the ones you aren't sure of and add some code to them to insert to a table everytime they are run. If after your whole year cycle, they haven't been run, you can safely eliminate them. The cycle may be shorter depending on the sp.

If I'm writing something that will only be run annually, I would normally put the word annual in the sp name. But that may not be true where you are, however, the function of the sp should give you an idea if it is something that should only be run periodically. I wouldn't expect usp_send email proc to only run once a year but I might expect that a usp_attendance_report might not be run often. Of course as I said, I would have named it something more like usp_annual_attendance_report and you can consider doing that sort of thing moving forward.

But be aware that any refactoring you do will have to take place on a long cycle to ensure that you don't delete something you need. If your code is in a source control system (and all database tables, sp, views, UDFs, triggers, etc should be), you can probably eliminate some things knowing that if they fail you can pretty instantly put them back. Again, I'd examine the object to determine the possible risk eliminating them would have.

Of course if you have good automated tests in place, eliminating something on dev and running the tests can help you find out if something is still being referenced.

If you are looking for an easy way to refactor, I don't know of one. Refactoring databses is a time-consuming, risky activity and one which may not show enough improvement for the powers that be to be willing to pay for it.

A good book on refactoring databases is:http://www.amazon.com/Refactoring-Databases-Evolutionary-Addison-Wesley-Signature/dp/0321293533

HLGEM
I know, I've read the book on database refactoring. I was looking for some guidance on what sort of level of pain is typical in production databases. i've only ever seen a few and they all seem painful, I was just wondering how painful is too painful.
No More Hacks
It's pretty painful usually. However if you are well-organized and work carefully, step by step and all your data access is controlled through stored procs and not dynamic queries, it is doable. I understand that ORM access is also doable but have no experience with that. One key is to make everything easy to rollback if need be and test test test. There is also no substitute for getting to know your database really really well. Start small with a couple of things you are sure are not critical and use those to get your system for refactoring up in place. Then do the biggest problem areas.
HLGEM