views:

144

answers:

6

I have a set of objects with attributes and a bunch of rules that, when applied to the set of objects, provides a subset of those objects. To make this easier to understand I'll provide a concrete example.

My objects are persons and each has three attributes: country of origin, gender and age group (all attributes are discrete). I have a bunch of rules, like "all males from the US", which correspond with subsets of this larger set of objects.

I'm looking for either an existing Java "inference engine" or something similar, which will be able to map from the rules to a subset of persons, or advice on how to go about creating my own. I have read up on rule engines, but that term seems to be exclusively used for expert systems that externalize the business rules, and usually doesn't include any advanced form of inferencing. Here are some examples of the more complex scenarios I have to deal with:

  1. I need the conjunction of rules. So when presented with both "include all males" and "exclude all US persons in the 10 - 20 age group," I'm only interested in the males outside of the US, and the males within the US that are outside the 10 - 20 age group.

  2. Rules may have different priorities (explicitly defined). So a rule saying "exclude all males" will override a rule saying "include all US males."

  3. Rules may be conflicting. So I could have both an "include all males" and an "exclude all males" in which case the priorities will have to settle the issue.

  4. Rules are symmetric. So "include all males" is equivalent to "exclude all females."

  5. Rules (or rather subsets) may have meta rules (explicitly defined) associated with them. These meta rules will have to be applied in any case that the original rule is applied, or if the subset is reached via inferencing. So if a meta rule of "exclude the US" is attached to the rule "include all males", and I provide the engine with the rule "exclude all females," it should be able to inference that the "exclude all females" subset is equivalent to the "include all males" subset and as such apply the "exclude the US" rule additionally.

I can in all likelihood live without item 5, but I do need all the other properties mentioned. Both my rules and objects are stored in a database and may be updated at any stage, so I'd need to instantiate the 'inference engine' when needed and destroy it afterward.

A: 

This pretty much sounds like description logic and knowledge bases to me. You have concepts, roles and individuums.

If you want to roll out your problem as description logic-based reasoning, you should be fine modeling your problem and execute a reasoner on it.

There are some free reasoners availaibe, a list can be found here.

Note however, that this is rather a complex yet powerful approach.

You might want to have a special look at KAON2 and DIG when using Java.

PartlyCloudy
I had a look at description logics and reasoners, but I have a few qualms. Firstly is the complexity of the solution. Secondly is the need for a fairly expressive description logic, which will affect reasoning time or maybe even make it impossible. Lastly is that the priority requirement isn't met out of the box by any description logic I know of, which would require me developing my own description logic variant. Care to comment?
Zecrates
Descriptive logics don't seem to do a very good job with "fuzzy". Your requirements state that there can be conflicting rules. An approach based on something like OWL is supposed to be "open-world", which means that it can represent the conflict. The question is, what does the reasoner (Pellet, etc) do with the conflict? Offhand I think you could add additional statements to the model that would disambiguate. Maybe someone who's done it knows the exact answer.
Ross Judson
A: 

I believe that you could use sort of ID3 algorithm to extract a set of rules from the initial state of your objects. I don't know any concrete Java implementation, although Wikipedia points to different implementations from Ruby to C (I can't post more than one hyperlink :-)), but it's not a hard algorithm to learn.

Once it builds the decision tree, that can be expressed in rule format, you could use it to see to which class your objects belongs: to all males from the US, to all females between 10 and 20,... and when someone updates your objects in the database, you can rebuild the decision tree.

Vicente Reig
Problems I've noticed. This will only help with classification, not with the inverse (given a set of rules, find the matching population). This is based on machine learning and as such is not 100% accurate. The rules are of such a nature that it is possible for 100% accurate inferences to be made. There is no initial population, I'd have to manually create and classify them to serve as training data.
Zecrates
A: 
Plínio Pantaleão
A: 

One of the most powerful Java-based production rules engines (inference engine) is JBoss DROOLS.

http://jboss.org/drools

I'll be honest though, unless your application get a LOT more complicated, using a rules engine is WAY overkill. On the other hand, if you application gets too big and has too many conflicting rules, then it will fail to provide a result.

If you can control your customer or problem domain better, it would be better to avoid inference engines altogether.

HDave
+1  A: 

For the case you're describing I think you'll want to use backwards-chaining, rather than forward chaining (RETE systems like Drools are forward-chaining, in their default behavior).

Check out tuProlog. Easy to bind with Java, 100% pure Java, and can definitely do the inferencing you want. You'll need to understand enough about Prolog to characterize your rule set.

Prova can also do inferencing and handle complex rule systems.

Ross Judson
+1, embedded SLD solver. tu-Prolog seems to be the accepted answer on http://stackoverflow.com/questions/1817010/embedded-prolog-interpreter-compiler-for-java
Charles Stewart
Thanks, I'll have a look. I'm away for training currently, but I'll respond as soon as I have enough time to evaluate it properly.
Zecrates
A: 

There are a bunch of embedded Prolog-like SLD solvers for Java; my favourite approach is to use mini-Kanren for Scala, since that is clean and allows you to use Scala to lazily handle the results of queries, but I have not used it in any depth. See Embedded Prolog Interpreter/Compiler for Java for other options, as well as Ross' answer.

SLD solvers handle all of your criteria, provided they have some extra features that Prolog has:

  1. Conjunction of rules: Basic SLD goal processing;
  2. Rules may have different priorities: Prolog's cut rule allows representation of negation, provided the queries are decidable;
  3. Rules may be conflicting: Again, with cut you can ensure that lower priority clauses are not applied if higher priority goals are satisfied. There are a few ways to go about doing this.
  4. Rules are symmetric: With cut, this is easily ensured for decidable predicates.
  5. Rules (or rather subsets) may have meta rules (explicitly defined) associated with them: your example seems to suggest this is equivalent to 4, so I'm not sure I get what you are after here.

The advantages and disadvantages of SLD solvers over description logic-based tools are:

  1. Programmatic power, flexibility: you can generally find programming solutions to modelling difficulties, where description logics might require you to rethink your models. But of course absence of duct-tape means that description logic solutions force you to be clean, which might be a good discipline.
  2. Robustness: SLD solvers are a very well understood technology, while description logic tools are often not many steps from their birth in a PhD thesis.
  3. Absence of semantic tools: description logic has nice links with first-order logic and model logic, and gives you a very rich set of techniques to reason about them. The flexibility of Prolog typically makes this very hard.

If you do not have special expertise in description logic, I'd recommend an SLD solver.

Charles Stewart