views:

47

answers:

2

So i am doing a research on how can i infer knowledge from reports (not with a specific format), but after pre processing, i should have some kind of formatted data.

A fairly basic inference would be: "Retailer has X stock." and "X is sellable." -> "Retailer sells X" the knowledge i focus is retail domain oriented, and if possible i should improve its efficiency with each iteration.

Is this scifi(some of my friends think it is)? The related stuff i find online are "expert systems" that find anomalies, fuzzy inference systems and some rants about "easy knowledge".

Can you help me with some points for me to focus or orient me in some research directions?

blueomega

A: 

What you have written reminds me of a "rule". Rules like this one (where the variables are all nominal) are the results of what is called association rule mining. Maybe this approach is one of those you should consider.

You could use the open source machine learning software Weka, or if you prefer R environment, then rattle gui may come in handy.

gd047
+1  A: 

You're certainly not talking about 'scifi', but a lot falls outside the standard stuff software engineers are typically exposed to. I spent the last eight years building and using rule engines to do inference over semi-structured data in the retail world.

Doing inference over data is a well established field. There are basically four classes of problems associated with this field:

  • Knowledge acquisition (getting rules out of peoples heads and into the code/rules)
  • Knowledge representation 'KR' (how to represent your data & rules)
  • Efficient pattern matching (matching a rule form a large ruleset against large number of facts/data)
  • Inference / Reasoning (drawing further conclusions from rule matches, ie rules triggering more rules)

For knowledge acquisition look at: Ripple Down Rules and Decision Trees, they go a long way and are easy to understand. Alternatively, the vast field Machine Learning offer a variety of approaches to derive models from data.

For knowledge representation look at RDF and Owl, and to a lesser degree Conceptual Graphs. In terms of expressiveness, RDF & CG are roughly equivalent. The basic concept behind both is a serialisation independent graph (triple) representation of data.

For pattern matching, the classic algorithm is Rete, by Charles Forgy.

For inference, there are two typical strategies: Forward chaining and backward chaining. Forward chaining is done over a ruleset like this:

The data setup:

 Rule 1:  If A Then B
 Rule 2:  If B Then C

 Facts:  A

The execution:

Do {
     Newfacts = Eval(RuleSet, Facts)
     Facts = Facts + Newfacts
} while (NewFacts.Count > 0)

Feed the data A to this little algorithm, and you will 'infer' (discover) fact C, from the data, thanks to the rulebase. Note that there a a lot of gotchas with inference, especially around things like non-monotonic reasoning (not just adding facts, but changing or removing facts, possibly giving rise to contradictions or loops in the inference).

A simplistic and naive way to get some kind inference going would be to use a database and use joins to match up facts (statements). This may be enough for some applications. When it comes to reasoning, it's easy to get sucked into a world of complications and not-quite-there technologies. Keep it simple.

FrederikB