views:

273

answers:

5

I'm looking for a framework to utilize to integrate a number of database consistency checking rules into our system. This should basically be a automated test case runner for our database checks.

Requirements:

  • Easy to write new rules or checks
  • Easy to run all rules, grouping of rule subsets would be a bonus
  • Accurate and simple reporting of rules as they are executed or after execution

I'm about to write something like this myself but I thought that I would see if I could find something else out there first. I googled it but couldn't find anything.

Some rule examples:

  • Ensure that in a child table for each record with [Rank] N, that N is 0 or there is a record with [Rank] N-1. E.g. child records will always have monotonically increasing ranks from 0 to MAX(Rank) for a given parent.
  • Our database uses a global "type/ID" system with a single MasterEntity table that is a header table for every Entity in the system. Each entity type belongs in 1 or more specific Entity tables and each Entity table only allows 1 or more specific types. Check that all entities in the system have correct records in their appropriate entity tables.
  • Ensure that all securable types have an entry in our security descriptor table
+1  A: 

Almost all of these can be dealt with using triggers for enhanced integrity constraints.

I suggest getting familiar with Object Role Modeling as a way to design relational data with a more granular constraint set, including the check types you describe.

le dorfier
So for a couple of reasons, performance and politics, writing these in as triggers or constraint checks is not an option for now. I'm still fighting the good fight here but in the mean time people are making mistakes and we're not catching them until much too late. We need something to catch these things on.
Peter Oehlert
Then even more reason to consider Object Role Modeling (which doesn't explicitly imply using triggers for implementation). But it sure helps design and document the constraints.
le dorfier
Checking for errors after the fact means weak database design. Nearly all database systems support contraints and triggers. These will ensure data consistency at the time of update. Performance can be fixed with an upgrade, which is likely to be cheaper than the cost of writing, documenting and maintaining a test suite and then fixing errors after the fact.
SpliFF
Also, a properly written trigger will always have better performance than some periodically run batch process, as well as ensuring that you don't open yourself up to non-yet-checked invalid data wrecking your online processes. Unfortunately, politics trumps technical validity every time.
Steve Broberg
A: 

It seems like you are looking for a system that allows you to declare database constraints of arbitrary complexity. I'm assuming that you want to catch violations of a constraint the moment they are made, and not in some post-factum verification procedure.

To do that, you need a relational DBMS, and alas, such a thing doesn't exist yet.

To do that in an SQL system, your only option is to write a vast lot of constraint enforcement code and stash those in triggers and such. If that is out of the question due to politics and such, there currently is no working solution to your problem.

Beware of those who guide you to Object solutions that cannot be tightly coupled to the DBMS itself. Eventually, someone will deploy some code that bypasses your outside-of-the-dbms constraint enforcement rules, leaving you exactly where you were when you decided to ask this question.

I agree with Spliff. Post factum constraint checking seems like a stupid idea to me too. But if that's what you really want, then here's a possible approach :

According to theory, all database constraints can be formulated as "a query whose result is required to be empty at all times". So such a query would, e.g., give you something like "all the pairs of distinct rows that have the same key value", or, generally speaking, such a query gives you a list of "all the things that are wrong".

Write down queries/scripts at a rate of one per possibly violated constraint, and run those overnight. Test if any of them return non-empty results.

I in fact am looking for a "post-factum" verification procedure. Think of this as DB unit tests or Database.Assert. The key issue is we have 6 different db developers who are all changing their little parts of the system, couple that with client developers reading/writing data and it's fairly easy for someone to use a column in a way different than some other sproc/view originally designed it for. We want a set of checks to make sure everyone's rules are being followed while we're in our development process but not something that is going to impact the runtime system.
Peter Oehlert
A: 

Well you could mix the correct answer (constraints and triggers) with your wrong answer (test after the fact) by building a second identical database WITH the constraints and then attempting to migrate the data. If the new database throws a wobbly when you try to INSERT you've found an error. Once your "politics" problem is resolved you can just migrate the whole thing permanently.

SpliFF
Perhaps, think of this more of a Database.Assert than an integrity issue. We have a complex schema and we want to make sure that both db developers and our client developers all use the system the way that it was intended. However, I'd like some checks that I can run to see if anyone has violated any of my non-runtime rules.
Peter Oehlert
You just described contraints and triggers. Your data is either compliant (didn't get rejected by a rule) or not (got rejected). If you insist on doing this the wrong way then why bother us with it? If you can't describe your schema using DB constraints then you've got some crazy rules and you need to stop allowing clients direct DB writes. The solution you're looking for will ruin you. Do not let the database become invalid ever or the data is next to useless and your system full of bugs and unmaintainable.
SpliFF
Frankly, your justification about not "impacting the runtime system" is bogus. Write your constraints, triggers and proceedures and then measure the impact. If it's too high update your hardware. It'll still be cheaper than chasing errors in the hodge-podge way you describe.
SpliFF
+1  A: 

I'm not aware of anything. This is probably because most DBAs are fairly conservative and would probably not allow an out of the box set of things into their databases.

  1. Enforce as much you can in the database with unique constraints, foreign keys, etc. Despite this, there are often many cases where you really can't encode everything into constraints. For instance, an empty table always satisfies constraints, so you have to have an exception report to tell you when a table is unexpectedly empty.

  2. Encapsulate all the exception handling into individual granular stored procedures with a consistent naming convention (i.e. usp_Integrity_XXXX).

  3. Create a master stored proc (usp_Integrity_Master) that either calls all those stored procs explicitly or using dynamic SQL with a cursor on INFORMATION_SCHEMA to find the appropriate procedures and call them in sequence, logging all the results to an appropriate table and firing off an email, report or whatever.

At a previous gig, I actually had a Agent job that called the master integrity check for each database, and emailed me. (Actually, some of the individual problems were emailed to appropriate staff responsible for tracking down and fixing problems. Especially with integration scenarios, you cannot just stop outside information coming in, you have to let it come in invalid and try to get it corrected later)

Cade Roux
A: 

As you can see from the responses here, the idea of running non-inline periodic checks on data isn't very popular. However, if you reframe your question in the form of "What tools exist that allow me to monitor the state of data in our system, as expressed by rules we define," you're talking about a class of solutions that most people here probably use.

For example, I'm sure most DBAs here would advocate monitoring scripts that track the size of certain tables, or check for data that has been idle for some period of time, etc. Success or failure of the script doesn't constitute a corrupt schema, but instead is a signal to perform some kind of maintenance or investigation.

We use Nagios as a system monitoring solution. All it does is lets you define rules, which are checked by running a script - in our case a .sql file run by the database's command line interface (sqlplus for Oracle). Scripts must return a value of pass, fail, or warn. You can set up how you get notified (mail, paging, etc.) and when (every fail, or just the first time it fails until it gets "cleared" by a success) and it keeps track of the history of events. As for the rules themselves, there is no "framework" in terms of point & click constraint enforcement - you'll have to be able to express the rule via writing your own sql.

It's a big field - just google around for "application monitoring" and find a product that suits your needs.

Steve Broberg