views:

38

answers:

1

Furthering: http://stackoverflow.com/questions/3910250/database-design-for-dynamic-form-field-validation/

How would I model the database when a particular field can have 0 or more validations rules AND that each validation rule is "related" to another rule via AND or OR.

For example, say I have field1 that needs to be minimum of 5 characters AND maximum 10 characters. These are 2 rules that apply to the same field and are related via an "AND." An example of how rules relate via an "OR" would be something like this: field1 should have exactly 5 characters OR exactly 10 characters.

The validation could get complex and have n-levels of nesting. How do I do this in a DB?

A: 

I don't think there's a simple answer to how to model this. The following conversation will hopefully get you started, and give you some sense of the issues involved.

So far as I can see, you have at least three types of entity: fields, simple rules, and complex rules (that is, rules made by combining other simple and/or complex rules).

The one piece of good news is that I'm pretty sure you just need two types of complex rule: an AND rule, and an OR rule, each of which applies a set of sub rules, and returns true or false based on the results returned by those subrules.

So you want to build a structure where each form has 1 or more fields, each field has 0 or more validation rules, and each rule has 0 or more sub-rules.

One challenge is just to keep track of the structure of each complex rule. What strikes me as the simplest way to do this is in a tree structure where each node has a parent. So you might have an OR rule with a parent of 0 (indicating that it's a top-level rule). There would then be 2 or more rules with the OR's ruleId as their parent. In turn, any of those might be an AND or OR rule which would be the parent of other rules. And so on down the tree.

Another challenge is how to extract your structure from the db so you can validate a form. It's preferable to minimize the number of queries it takes to do this. In a straight tree, where the structure is only established by children nodes knowing their parents, you'd need a separate query to get each parent's immediate children. So it'd be nice if you could aggregate all the children together under a single ancestor.

If any rule can only be assigned to 1 field, then you can have a fieldId column in your rules table, and each rule will be assigned to a field. Then you can join a form to its fields, and those fields to their rules, and pull out everything in one query. Then the application logic would be responsible for turning the data into a functional tree structure.

However, if you want rules to be reusable, that's not going to work. For example, you might want an abstract zip code rule which combined several sub rules (rather than being a giant regex). And then you might want to make that a US zip code rule, and make another for Canada, and another for any of multiple countries, and then you might want to combine some or all of those depending on which field was being validated. So you might have, for example a US OR Canada zip rule applied to some fields, a US only rule applied to other fields, etc.

One way to do this is to remove the fieldId field from rules, and add a new field_rules junction table with fieldId and ruleId as its columns. However, removing fieldId from fields puts you back into not having a single-query means of extracting all the rules (including sub rules) for a field, never mind for a form. So you might add an origin column to the rules table, and all the subrules of a complex rule would have that top-level field's id as their origin.

Now things might get even more complex if you want to allow overriding some of a reusable rule's data for specific fields. Then you might add either a new field_rule_data table, or just data columns to the field_rules table.

Implementing a tree structure means that your application logic for both building and applying complex rules is probably going to have to be recursive.

Having said all that, I suspect your real challenge is going to be at the UI level.

Edit

I thought about this some more, and it's seeming even more complicated. I'm sure the following is also inadequate, but I hope it will facilitate figuring out a full answer.

I'm now thinking you have 5 tables: rules, rule_defs, rule_defs_index, fields, field_rules. They go something like this:

Rules
rule_id (PK)
name
data (can be null)

Rule_Defs rule_def_id (PK)
rule_id (FK to rule_id)
parent (FK to rule_def_id)
origin (FK to rule_def_id: optional convenience field)

Rule_Defs_Index
rule_id (FK)
rule_def_id (FK)

Fields
field_id (PK)
name

Field_Rules
field_id (FK and part of PK)
rule_id (FK and part of PK)

Just making stuff up here in a vaguely plausible way, here's some sample data:

Rules  
id name                data
1 AND  
2 OR  
3 5 digits             /^\d{5,5}$/
4 5-4 pattern          /^\d{5,5}-\d{4,4}$/
5 US Zip  
6 6 alphanumerics      /^[A-Za-z0-9]{6,6}$/
7 US or Canada Zip

Rule_Defs  
id rule_id parent origin  
1     5       0     1
2     2       1     1
3     3       2     1
4     4       2     1
5     7       0     5
6     2       5     5
7     5       6     5
8     6       6     5   

Rule_Defs_Index (just data for US Canada Zip since that's biggest)
rule_id rule_def_id
7           2
7           3
7           4
7           5
7           6
7           7

Fields  
field_id name
1         billing zip
2         shipping zip

Field_Rules  
field_id rule_id
1           7
2           7

Note that the assumption here is that it creating and editing rules will happen rarely relative to applying rules. Thus creating and editing will be fairly cumbersome and relatively slow activities. To avoid this being the case for the far more common application of rules, the Rule_Defs_Index should make it possible to extract everything needed to build a rule structure for a field (or a form) with a single query. Of course, once it's retrieved, the application will have to do a fair amount of work to turn the data into a useful structure.

Note that you might want to cache the constructed data in serialized form, rebuilding the cache in the relatively rare instances when a rule is edited or created.

Sid_M
Sid_M, let me digest your suggestions. I've had problems with parent-child relationships in the past (probably just due to my own limitations with DBs, though). If you take a look at MS InfoPath, they implement the idea I am talking about (see: http://monjurul.files.wordpress.com/2009/04/step3.jpg). Thanks.
StackOverflowNewbie
Yeah, parent-child is challenging. To be clear, my answer isn't something I've tested, but is just me trying to figure out how I'd approach this. I suspect there's a better solution, but I obviously don't know what it is.
Sid_M
Thanks again. More stuff to digest.
StackOverflowNewbie