views:

180

answers:

4

I'm curious about this:

In Microsofts Outlook Express (or Outlook, don't remember well, I'm a Mac user), they have something really cool. Generic rules:

You can configure a set of rules to automatically sort or delete your emails, for instance. It's incredible powerful and easy to use.

These rules looked pretty much like this:

"If email in inbox has subject which contains 'foo', or 'bar', or 'foobar' delete it"

I need to code something similar for a powerful form validation system. The developer should simply be able to create rules like this:

rule: [password_1] is_not_equal_with [password_2]
consequence: show_error '2921'

rule: [firstName] has_less_characters_than '2'
consequence: show_error '1211'

rule: [age] is_numeric, is_smaller_than '13', is_greater_than '130'
consequence: show_error '1522'

rule: [gender] is_equal_with 'female'
consequence: show_group [female_questions]

rule: [termsAndConditionsAccepted] is_not_checked
consequence: show_error '482'

rule: [age] is_less_than 21
consequence: hide_group [income_questions]

Well, I have some ideas how this could be done, and I will post them here as answer. But before I reinvent the wheel: Are there any written concepts I can use as foundation to develop a rule-based validation system similar to this? Or if not, do you have any suggestions how this could be done?

In the example above, everything in square brackets is the name of an html form element. Everything in apostrophs '' is a "hard coded" value to compare against.

The defined rules are translated into PHP code AND JavaScript code to do both client- and server side validation.

Features this must be capable of:

  • Conditional rules: Something A depends on something B
  • Value comparisons: For integers, floats, strings
  • Enable some form control logic as well, like in the "[gender] is_equal_with 'female'" example above.

How could this be done? What are the entities I must consider, from a scientific point of view?

I think the theoretical concept of this is platform independent. Although I will implement this in PHP and JavaScript, there's no reason why a C++ dev should not respond ;-) (I'm an Objective-C guy, btw)

+1  A: 

In an object-oriented design, one approach be to implement the command pattern or, for more complex needs, the interpreter pattern. You'd typically create several classes for different categories of rules, and you can compose them for more complex scenarios (by building CompositeRule, for example); all of them support an interface like Execute() or Execute(context).

You build up a queue of rule instances, and call Execute(context) on each of them for each object acted on. The context would containing an instance of the object (message, or form, or whatever) you are acting upon.

JasonTrue
sounds interesting. So internally those rules would be stored as string? Like an regular expression for example? I had a quick look into my book "Design Patterns - Elements of Reusable Object-Oriented Software" for the Interpreter pattern. I see two issues: 1) making it easy-to-use for the developer (it should work similar like in Outlook Express, i.e. I need something like a Rule Builder), and 2) building up the object tree out of the rule string.
openfrog
Storage is "just" an issue of translation. In one project, we used XML to store rules and implemented a "command translator" that transformed command names and parameters into actual object; an earlier version did the same except that it used Excel. Personally, if I were doing this in PHP, Javascript or Ruby, I'd use something like Yaml to store the rules. You just need to translate the parameters into instances.
JasonTrue
As a developer, Yaml or XML are easier for me to work with than clunky user interfaces. But the UI part shouldn't be very difficult.
JasonTrue
+2  A: 

You might want to check out some of the opensource rules engines; or even a paid for one.

Examples include
Pay for it:
InRule, Business Rules Engine, ASA Business Rules Engine

Opensource:
OpenRules, Drools

There are a lot more. Including some built in to java (Java Rule Engine API (JSR94)), and .Net (Windows Workflow Foundation Rules Engine).

Not sure about straight PHP though.

As a side note, I've used a couple engines, like Haley Rules (before they were bought by Oracle) to drive web UI's. Be aware that execution speed is absolutely critical. We had Haley processing about 2000 rules per page load (mortgage app), and it was executing in under 40 ms (not a typo). We used it to decide what fields were on the page as well as determine whether the entered data was consistent, met legal standards, and even whether it was entered correctly.

Some of the other engines were much much slower even on much smaller rule sets due to how long it took to simply instantiate the engines.

I've also gone down the path of writing my own for smaller systems. In my case I used javascript and simply set up variables with data from the posted page prior to executing the scripts that were saved with the forms.

This was also performant on a smaller scale, but I limited it to only giving simple go / no go responses.

Chris Lively
+1  A: 

Chain the rules in a chain-of-responsibility.

baskin
That's a good point... The Microsoft rules engines (for Outlook, Exchange) actually process "all" rules, unless you specifically add a "then stop processing other rules" requirement to the rule. However, for many applications, you'd probably prefer to process rules until one of items in the chain has taken responsibility for an item. You'd have to look at the specific business problem to decide whether you need a chain of responsibility, but it may be more suitable for some cases.
JasonTrue
+1  A: 

For a small number of rules and messages you can apply a brute force algorithm: take each rule and each message and compare if they fit. You will get to an O(r*m*) complexity where r is the number of rules and m is the number of message, not taking in consideration that a rule can have multiple conditions.

For a huge number of rules or messages you can implement a Rete network (http://en.wikipedia.org/wiki/Rete_algorithm). This takes some memory but is much much faster in practice. Depending on the way you design your rules you will get different complexities.

The first approach is simple and I don't think I need to explain it. However if you need help let me know and I will detail that idea. Let me explain the second approach:

Read a little about Rete algorithm before going further.

In the alpha part of the Rete network you will store distinct conditions that appear in your rules. Some rules might share some conditions. Like:

Rule1: IF (message.date equals 24.10.2009) AND (message.title contains "hello") THEN do something1

Rule2: IF (message.hasAttachement is TRUE) AND (message.date equals 24.10.2009) THEN do something2

So the Alpha part of the network will have 3 elements

  • C1: (message.date equals 24.10.2009)
  • C2: (message.title contains "hello")
  • C3: (message.hasAttachement is TRUE)

In the Beta net you will have two join node that link C1-C2 and C3-C1.

The production nodes that end the beta network will contain the series of actions that must be performed when a message satisfies all the conditions of the rule (in the alpha part) and all the consistency checks (in the beta part).

The most complicated part is the beta network. If you want just the logical AND in your rules (no other logical op or parenthesis) then is trivial. However if you want more complicated constructs then you'll have to write a lot of code and do a lot of tests.

For more info about Rete:

  • Production matching for large learning systems /--Robert B. Doorenbos. (1995)
  • On the Efficient Implementation of Production Systems /-- Charles L. Forgy (1979)
Victor Hurdugaci