tags:

views:

35

answers:

1

Hi,

strange results come up while using a J48 tree. I need to classify a vector of 48 features, which works very well, but when i tried to "optimize", I run into strange results.

I have a method classify:

    public boolean classify(double feature1, double feature2, double[] featureVec ) {
        Instance toBeClassified = new Instance(2+featureVec.length);
        toBeClassified.setValue(0, feature1);
        toBeClassified.setValue(1, feature2);
        for (int i = 2; i < f.length + 2; ++i) {
            toBeClassified.setValue(i, featureVec [i - 2]);
        }
        toBeClassified.setDataset(dataset);

        try {
            double _class = tree.classifyInstance(toBeClassified);
            return _class > 0;
        } catch (Exception e1) {
            if(Logging.active) { 
                logger.error(e1.getMessage(), e1.getCause());}
            }
        return false;
    }
}

It works quite well, and i hope i'm doing things right. But I wanted to remove the instance creation which is done at every method call, so I moved the Instance toBeClassified = new Instance(48); line into the class body - so it is created only once. That works well too, despite of the fact, that I get slightly different results compared with the other. lets say, from 400 classifications, one is different (not to say, incorrect). But I don't see a reason for this...I hope here are some guys using weka, so that I understand whats going on/wrong. (Yes, 2+featureVec.length is 48).

Thanks and regards.

+2  A: 

It's very unlikely that anything is wrong with J48. Both classifier creation and classification itself is deterministic. I'd recommend to post bigger part of Your code, because this one looks great (unbuggy).

As for Your 400 loop test: this one definitely should produce the same results every time, no exceptions. Two thoughts:

  • Put assert that checks if the values of instance are same as the model one. That would rule out any bug in Instance.

  • Does classification run in multi-threading manner? Are there any shared data objects?

Rekin
@rekin I'm sorry but I cant show more code since it would violate what i have singed in my contract. Hahahaha, your my man. Just as I read your "multi-threading manner" - yes, this function is called from different threads concurrent. Ofc this is no problem if every method call creates its own instance, but if i use only one, I have to synchronize to get things done correctly. I will rework this tomorrow and then, if sucessful, accept your answer.
InsertNickHere
Thanks for quick reply. Correct me if I'm wrong: let's say You have N Threads, N Instances and... One instance of classifier?
Rekin
@rekin Yes, i have.
InsertNickHere
So the classifier instance is shared between threads. I don't know if that's is the problem, but I heard that in Java even reads have to be synchronized. Try cloning instance for each thread. `ThreadLocal` object from `java.util.concurrent` package could be of help.
Rekin