ansaurus

Question

Save PyML.classifiers.multi.OneAgainstRest(SVM()) object?

Answer 1

+1 A:

In multi.py on line 96 "self.classifiers[i].train(datai)" is called without passing "**args", so that if you call "mc.train(data, saveSpace=False)", this saveSpace-Argument gets lost. This is why you get an error message if you try to save the classifiers in your multiclass-classifier individually. But if you change this line to pass all arguments, you can save each classifier individually:

#!/usr/bin/python

import numpy

from PyML.utils import misc
from PyML.evaluators import assess
from PyML.classifiers.svm import SVM, loadSVM
from PyML.containers.labels import oneAgainstRest
from PyML.classifiers.baseClassifiers import Classifier
from PyML.containers.vectorDatasets import SparseDataSet
from PyML.classifiers.composite import CompositeClassifier

class OneAgainstRestFixed(CompositeClassifier) :

    '''A one-against-the-rest multi-class classifier'''

    def train(self, data, **args) :
        '''train k classifiers'''

        Classifier.train(self, data, **args)

        numClasses = self.labels.numClasses
        if numClasses <= 2:
            raise ValueError, 'Not a multi class problem'

        self.classifiers = [self.classifier.__class__(self.classifier)
                            for i in range(numClasses)]

        for i in range(numClasses) :
            # make a copy of the data; this is done in case the classifier modifies the data
            datai = data.__class__(data, deepcopy = self.classifier.deepcopy)
            datai =  oneAgainstRest(datai, data.labels.classLabels[i])

            self.classifiers[i].train(datai, **args)

        self.log.trainingTime = self.getTrainingTime()

    def classify(self, data, i):

        r = numpy.zeros(self.labels.numClasses, numpy.float_)
        for j in range(self.labels.numClasses) :
            r[j] = self.classifiers[j].decisionFunc(data, i)

        return numpy.argmax(r), numpy.max(r)

    def preproject(self, data) :

        for i in range(self.labels.numClasses) :
            self.classifiers[i].preproject(data)

    test = assess.test

train_data = """
0 1:1.0 2:0.0 3:0.0 4:0.0
0 1:0.9 2:0.0 3:0.0 4:0.0
1 1:0.0 2:1.0 3:0.0 4:0.0
1 1:0.0 2:0.8 3:0.0 4:0.0
2 1:0.0 2:0.0 3:1.0 4:0.0
2 1:0.0 2:0.0 3:0.9 4:0.0
3 1:0.0 2:0.0 3:0.0 4:1.0
3 1:0.0 2:0.0 3:0.0 4:0.9
"""
file("foo_train.data", "w").write(train_data.lstrip())

test_data = """
0 1:1.1 2:0.0 3:0.0 4:0.0
1 1:0.0 2:1.2 3:0.0 4:0.0
2 1:0.0 2:0.0 3:0.6 4:0.0
3 1:0.0 2:0.0 3:0.0 4:1.4
"""
file("foo_test.data", "w").write(test_data.lstrip())

train = SparseDataSet("foo_train.data")
mc = OneAgainstRestFixed(SVM())
mc.train(train, saveSpace=False)

test = SparseDataSet("foo_test.data")
print [mc.classify(test, i) for i in range(4)]

for i, classifier in enumerate(mc.classifiers):
    classifier.save("foo.model.%d" % i)

classifiers = []
for i in range(4):
    classifiers.append(loadSVM("foo.model.%d" % i))

mcnew = OneAgainstRestFixed(SVM())
mcnew.labels = misc.Container()
mcnew.labels.addAttributes(test.labels, ['numClasses', 'classLabels'])
mcnew.classifiers = classifiers
print [mcnew.classify(test, i) for i in range(4)]

ephes 2010-04-20 17:13:57

@ephes, I'm sorry, could you clarify a little bit? What should I pass to train; saveSpace=True or saveSpace=False? Also, what about loading the classifiers... if I load them individually as you are suggesting, how to I put them back into the single multiclass classifier?

Michael Aaron Safyan 2010-04-20 23:39:00

saveSpace=False (weird stuff...) PyMLs abstractions are really leaky.Ok, i changed the sample source to reread the models and build a new multiclass-classifier and recompute the scores for the test data.

ephes 2010-04-21 11:45:48

@ephes, Thank you. It looks like your OneAgainstRestFixed is identical to the original OneAgainstRest, except that you use "self.classifiers[i].train(datai, **args)" while the original accidentally omits the "**args" parameter. Things are saving now, but the loading isn't working correctly. I will create a followup question for that.

Michael Aaron Safyan 2010-04-22 02:17:58

@ephes, you wouldn't happen to have an answer to this one, would you?: http://stackoverflow.com/questions/2687981/loading-a-pyml-multiclass-classifier-why-isnt-this-working

Michael Aaron Safyan 2010-04-23 11:38:59

Answer 2

A:

Get a newer version of PyML. Since version 0.7.4, it is possible to save the OneAgainstRest classifier (with .save() and .load()); prior to that version, saving/loading the classifier is non-trivial and error-prone.

Michael Aaron Safyan 2010-06-20 06:51:30

ansaurus

tags:

views:

answers:

Save PyML.classifiers.multi.OneAgainstRest(SVM()) object?

related questions