tags:

views:

473

answers:

4

I'll use a university's library system to explain my use case. Students register in the library system and provide their profile: gender, age, department, previously completed courses, currently registered courses, books already borrowed, etc. Each book in the library system will define some borrowing rules based on students' profile, for example, a textbook for the computer algorithm can only be borrowed by students currently registered with that class; another textbook may only be borrowed by students in the math department; there could also be rules such that students can only borrow 2 computer networking book at most. As a result of the borrowing rules, when a student searches/browses in the library system, he will only see the books that can be borrowed by him. So, the requirement really comes down to the line of efficiently generating the list of books that a student is eligible to borrow.

Here is how I vision the design using Drools - each book will have a rule with a few field constraints on the student profile as LHS, the RHS of the book rule simply adds the book id to a global result list, then all the book rules are loaded into a RuleBase. When a student searches/browsers the library system, a stateless session is created from the RuleBase and the student's profile is asserted as the fact, then every book that the student can borrow will fire its book rule and you get the complete list of books that the students can borrow in the global result list.

A few assumptions: the library will handle millions of books; I don't expect the book rule be too complicated, 3 simple field constraints for each rule on average at the most; the number of students that the system needs to handle is in the range of 100K, so the load is fairly heavy. My questions are: how much memory will Drools take if loaded with a million book rules? How fast will it be for all those million rules to fire? If Drools is the right fit, I'd like to hear some best practices in designing such a system from you experienced users. Thanks.

A: 

My experience with Drools (or a rules engine in general) is that it is a good fit if user visibility into the rules are important, or if fast changes to the rules without making it a coding project is important, or if the set of rules is very large making it hard to manage, think about and analyze in code (so you would have business people asking technical people to go read the code and tell them what happens in situation X).

That being said, rules engines can be a bottleneck. They don't run anything close to the performance of code, so you do need to manage that up front architecturally. In this specific case there is certainly a database behind this, and you can add to the performance issues that the database will return a query a whole lot faster than you can analyze the whole set in code.

I would absolutely not implement that by making a million rules objects, rather I would make a book type that multiple books can be assigned to, and run the rules against the book types, and then only show books that are in an allowed type. This way you could load the types, pass them through the rules engine, and then push the allowed types to a query on the database end that pulls the list of books in the allowed types.

Types get a bit complicated by the fact it will be likely that in practice a book may be of two types (allowed if you are taking a certain course, or in general if you are part of the department), but the approach should still hold.

Yishai
A: 

I would be worried about the need to have the number of rules a function of the number of students - that could really make things tricky (that sounds like the biggest problem).

Michael Neale
+3  A: 
Michael Deardeuff
+1 for redefining the problem - I agree that you don't need a million books as rule session facts, only the ones the student put in the 'basket' on the way to 'checkout'
Peter Hilton
A: 

My questions are: how much memory will Drools take if loaded with a million book rules? How fast will it be for all those million rules to fire?

How fast is your computer and how much memory have you got? In one sense you can only find out by building a proof of concept and filling it with the right quantity of (randomly-generated) test data. My experience is that Drools is faster than you expect, and that you have to have very good knowledge of what's under the hood to be able to predict what is going to make it slow.

Note that you are talking about a million rule session facts (i.e. Book objects), not a million rules. There are only a handful of rules, which won't take long to fire. The potentially slow part is inserting the million objects, because Drools needs to decide which rules to put on the Agenda for each new fact.

It's a shame that none of us has an answer for some particular set-up with a million facts.

As for the implementation, my approach would be to insert a Book object for each book that the student wants to check out, retract the ones that are not allowed, and a query to get the remaining (allowed) Book objects, and another query to get the list of reasons. Alternatively, use RequestedBook objects that have additional boolean allowed and String reasonDisallowed properties that you can set in your rules.

Peter Hilton