views:

136

answers:

5

Hi,

I'm interested in understanding if there is a common technique/pattern for checking data in a database conforms to a set of rules. I'm interested, perhaps, in running a service/job each night that churns through the data and raises exceptions to rules. I would quite like the system to be dynamic, i.e. it's quite easy to define new rules.

A very simple example could be:

 Customer Cash% 
 A        50 
 B        25 
 C        30

Rule: Tell me all customers that have < 50% cash.

I'm not really coupled to a particular tool or idea, however our expertise are C#, XML and Sql Server. All our data is in SQL SERVER Databases.

If there isn't anything I will go back to looking at a custom rule defining system, currently my idea is based around rules defined in XML (for the dynamic aspect) and a Windows Service process data against the rules.

Cheers.

EDIT: I should say that this could get complex, whereby I would like to check a bunch of tables and their data against a "Model" to see whether a particular object is following rules. For example there might be a "Model" customer and you want to check how many customers don't follow the model rules, e.g. Cash, Last Payment, Nationality etc.

EDIT: Following on from a couple of comments, I fully understand that "invalid" data should be prevented from been entered in the first place. My scenario deals with the fact that the data may be perfectly valid, however it might not follow "house rules", things that don't follow the rules might then need to be justified.

A: 

Your best bet might be to create a maintenance plan in SQL Server, with one step in the plan for each rule. Each rule would check the data and insert into an exception table if it found any nonconforming rules. This would allow for you to leverage the tools provided by SQL Server and maintain fairly easy maintenance of the rules themselves (adding, removing, and reordering).

Adam Robinson
+1  A: 

Create a script of SQL statements, with one statement being equal to a single rule. In your example, your statement might be:

INSERT INTO EXCEPTION 
   (RULE_NAME, DETAIL) 
VALUES 
("CASH_LEVEL_LOW", SELECT CUSTOMER_ID FROM CUSTOMER WHERE CUSTOMER.CASH < 50);

I'm not up-to-date on the syntax, but you should be able to get the gist of the idea from here. It would insert into another table one record per violation, with sufficient data so that you could locate the record easily.

Elie
I have had similar thoughts. There are two things that concern me about this approach, A: it doesn't seem very easy to maintain. Say for example I want to quickly change the percentage, and we've got 200 rules. B: It worries me having too much business logic in the database.
MrEdmundo
First, the logic is not in the database, but in a script file that runs on the database, which makes is easier to change. You can name your statements such that you would be able to find them later easily by just searching for the name.
Elie
I appreciate what you're saying. When I imagine the scenario I'm describing I imagine laying a blueprint over a load of data and looking where the data is outside the lines. I'm not sure how neat that could be using scripts like we're talking about
MrEdmundo
The scripts would be your blueprints... but you wouldn't need to keep them inside the database per se. I see your point, though.
Elie
A: 

I would quite like the system to be dynamic, i.e. it's quite easy to define new rules.

You already have such a system; it is your database. In particular, check constraints serve to prevent invalid data from being entered at all.

For a case like your example -- where you want to allow the value but flag it -- write a view, and have a client application issue an error if the view has any rows.

Here's an example: create view low_on_cash as select * from table where "Cash%" < 50 ;

In the client, you'd raise an error if "select count(*) from low_on_cash" didn't return 0;

If you established a convention that all such views were named with a prefix, e.g., "error_report", your client could select all such view names from the systables for the database in one query, then iterate that list by calling "select count(*) from " + viewname; logging an error for any that returned more than zero rows.

Since this would be data-driven, adding a new error report would consist of nothing more than creating a view with the proper name prefix; you'd not have to recompile the client.

The additional advantage is that adding any rule engine would require learning its Domain Specific Language for writing rules, training new staff on it, and even then inevitably there would be corner cases the rules didn't easily cover. Your coders already know SQL, and it's based on a 17 year old ANSI Standard based on 20 earler years of use, so most of the corner cases have been ironed out of the language.

tpdi
That could potentially result in hundreds of views, and no central grouping of all entries across all views. It would work, but try generating a report of all rule violations.
Elie
That is my point Elie, tpdi I don't disagree that you're idea would work, I just think it would be clunky and difficult to manage. Imagine 200 views. Ed
MrEdmundo
A: 

Personally I don't allow data to be entered in my database that doesn't follow the business rules. That is what check constraints, unique indexes, and triggers are for.

HLGEM
Good call! Anytime I hear about people using scheduled jobs in the DB to manipulate or validate data it usually points to someone barking up the wrong tree.
JohnFx
Who says it's invalid data? Maybe it's just a flag for data that may soon become invalid (approaching the invalid condition).
Elie
And who says it isn't invalid data? The first step is to make sure not to put bad data into your database ever. If you want to check if something is approaching a limit that has nothing to do with data validation but is just like any other query or proc you want to run.
HLGEM
I believe that what the question is about: looking for a way to report on certain bits of data which is not technically invalid, and hence can go into the database, but needs to be reviewed (e.g. if less than 50% cash then send a reminder). I agree that invalid data doesn't belong in the DB.
Elie
In my mind HLGEM, validation of a data against a rule doesn't mean that the data been checked is completely invalid, it is only invalid when checked with the particular rule. It may be that the rule is the rule for the database, hence it's existence is invalid, this is not the case for my scenario.
MrEdmundo
+2  A: 

As it isn't mentioned yet, I'd suggest you'd take a look at the data mining capabilities of SQLServer. One of them is the abbility to highlight exceptions in your data.

Take a look at this Highlight Exceptions Video Tutorial to get you started.

Lieven
I'm going to have a look at this when I get chance, a quick scan read of the manuscript suggests it might be a possible solution.
MrEdmundo