ansaurus

Question

Aggregating automatically-generated feature vectors

Answer 1

A:

You could try a neural network approach, trained via backpropagation, assuming you have or can randomly generate (based on the old ruleset) a large set of data that hit all your classes. Using a hidden layer of appropriate size will allow you to approximate arbitrary discriminant functions in your feature space. This is more or less the same idea as clustering, but due to the training paradigm should have no issue with your discrete inputs.

This may, however, be a little too "black box" for your case, particularly if you have zero tolerance for false positives and negatives (although, it being a one-off process, you get an arbitrary degree of confidence by checking a gargantuan validation set).

ezod 2010-01-19 19:24:04

Unfortunately we need to be able to introspect the exact rules, although your idea would be excellent for many other use cases.

rjh 2010-01-20 12:22:38

Answer 2

+1 A:

Twenty-five million rules? How many features? How many values per feature? Is it possible to iterate through all combinations in practical time? If you can, you could begin by separating the rules into groups by result.

Then, for each result, do the following. Considering each feature as a dimension, and the allowed values for a feature as the metric along that dimension, construct a huge Karnaugh map representing the entire rule set.

The map has two uses. One: research automated methods for the Quine-McCluskey algorithm. A lot of work has been done in this area. There are even a few programs available, although probably none of them will deal with a Karnaugh map of the size you're going to make.

Two: when you have created your final reduced rule set, iterate over all combinations of all values for all features again, and construct another Karnaugh map using the reduced rule set. If the maps match, your rule sets are equivalent.

-Al.

A. I. Breveleri 2010-01-19 23:07:49

Answer 3

+1 A:

Check out the Weka machine learning lib for Java. The API is a little bit crufty but it's very useful. Overall, what you seem to want is an off-the-shelf machine learning algorithm, which is exactly what Weka contains. You're apparently looking for something relatively easy to interpret (you mention that you want it to deduce the relationship between A and B and to tell you that C is just noise.) You could try a decision tree, such as J48, as these are usually easy to visualize/interpret.

dsimcha 2010-01-19 23:32:04

Accepting - I have implemented a simple classification algorithm which takes advantage of relationships and implications that I discovered by using Weka. Thanks.

rjh 2010-02-14 21:03:12

ansaurus

tags:

views:

answers:

Aggregating automatically-generated feature vectors

related questions