views:

120

answers:

2

I'm using PYML to construct a multiclass linear support vector machine (SVM). After training the SVM, I would like to be able to save the classifier, so that on subsequent runs I can use the classifier right away without retraining. Unfortunately, the .save() function is not implemented for that classifier, and attempting to pickle it (both with standard pickle and cPickle) yield the following error message:

pickle.PicklingError: Can't pickle : it's not found as __builtin__.PySwigObject

Does anyone know of a way around this or of an alternative library without this problem? Thanks.

Edit/Update
I am now training and attempting to save the classifier with the following code:

mc = multi.OneAgainstRest(SVM());
mc.train(dataset_pyml,saveSpace=False);
    for i, classifier in enumerate(mc.classifiers):
        filename=os.path.join(prefix,labels[i]+".svm");
        classifier.save(filename);

Notice that I am now saving with the PyML save mechanism rather than with pickling, and that I have passed "saveSpace=False" to the training function. However, I am still gettting an error:

ValueError: in order to save a dataset you need to train as: s.train(data, saveSpace = False)

However, I am passing saveSpace=False... so, how do I save the classifier(s)?

P.S.
The project I am using this in is pyimgattr, in case you would like a complete testable example... the program is run with "./pyimgattr.py train"... that will get you this error. Also, a note on version information:

[michaelsafyan@codemage /Volumes/Storage/classes/cse559/pyimgattr]$ python
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import PyML
>>> print PyML.__version__
0.7.0
+1  A: 

In multi.py on line 96 "self.classifiers[i].train(datai)" is called without passing "**args", so that if you call "mc.train(data, saveSpace=False)", this saveSpace-Argument gets lost. This is why you get an error message if you try to save the classifiers in your multiclass-classifier individually. But if you change this line to pass all arguments, you can save each classifier individually:

#!/usr/bin/python

import numpy

from PyML.utils import misc
from PyML.evaluators import assess
from PyML.classifiers.svm import SVM, loadSVM
from PyML.containers.labels import oneAgainstRest
from PyML.classifiers.baseClassifiers import Classifier
from PyML.containers.vectorDatasets import SparseDataSet
from PyML.classifiers.composite import CompositeClassifier

class OneAgainstRestFixed(CompositeClassifier) :

    '''A one-against-the-rest multi-class classifier'''

    def train(self, data, **args) :
        '''train k classifiers'''

        Classifier.train(self, data, **args)

        numClasses = self.labels.numClasses
        if numClasses <= 2:
            raise ValueError, 'Not a multi class problem'

        self.classifiers = [self.classifier.__class__(self.classifier)
                            for i in range(numClasses)]

        for i in range(numClasses) :
            # make a copy of the data; this is done in case the classifier modifies the data
            datai = data.__class__(data, deepcopy = self.classifier.deepcopy)
            datai =  oneAgainstRest(datai, data.labels.classLabels[i])

            self.classifiers[i].train(datai, **args)

        self.log.trainingTime = self.getTrainingTime()

    def classify(self, data, i):

        r = numpy.zeros(self.labels.numClasses, numpy.float_)
        for j in range(self.labels.numClasses) :
            r[j] = self.classifiers[j].decisionFunc(data, i)

        return numpy.argmax(r), numpy.max(r)

    def preproject(self, data) :

        for i in range(self.labels.numClasses) :
            self.classifiers[i].preproject(data)

    test = assess.test

train_data = """
0 1:1.0 2:0.0 3:0.0 4:0.0
0 1:0.9 2:0.0 3:0.0 4:0.0
1 1:0.0 2:1.0 3:0.0 4:0.0
1 1:0.0 2:0.8 3:0.0 4:0.0
2 1:0.0 2:0.0 3:1.0 4:0.0
2 1:0.0 2:0.0 3:0.9 4:0.0
3 1:0.0 2:0.0 3:0.0 4:1.0
3 1:0.0 2:0.0 3:0.0 4:0.9
"""
file("foo_train.data", "w").write(train_data.lstrip())

test_data = """
0 1:1.1 2:0.0 3:0.0 4:0.0
1 1:0.0 2:1.2 3:0.0 4:0.0
2 1:0.0 2:0.0 3:0.6 4:0.0
3 1:0.0 2:0.0 3:0.0 4:1.4
"""
file("foo_test.data", "w").write(test_data.lstrip())

train = SparseDataSet("foo_train.data")
mc = OneAgainstRestFixed(SVM())
mc.train(train, saveSpace=False)

test = SparseDataSet("foo_test.data")
print [mc.classify(test, i) for i in range(4)]

for i, classifier in enumerate(mc.classifiers):
    classifier.save("foo.model.%d" % i)

classifiers = []
for i in range(4):
    classifiers.append(loadSVM("foo.model.%d" % i))

mcnew = OneAgainstRestFixed(SVM())
mcnew.labels = misc.Container()
mcnew.labels.addAttributes(test.labels, ['numClasses', 'classLabels'])
mcnew.classifiers = classifiers
print [mcnew.classify(test, i) for i in range(4)]
ephes
@ephes, I'm sorry, could you clarify a little bit? What should I pass to train; saveSpace=True or saveSpace=False? Also, what about loading the classifiers... if I load them individually as you are suggesting, how to I put them back into the single multiclass classifier?
Michael Aaron Safyan
saveSpace=False (weird stuff...) PyMLs abstractions are really leaky.Ok, i changed the sample source to reread the models and build a new multiclass-classifier and recompute the scores for the test data.
ephes
@ephes, Thank you. It looks like your OneAgainstRestFixed is identical to the original OneAgainstRest, except that you use "self.classifiers[i].train(datai, **args)" while the original accidentally omits the "**args" parameter. Things are saving now, but the loading isn't working correctly. I will create a followup question for that.
Michael Aaron Safyan
@ephes, you wouldn't happen to have an answer to this one, would you?: http://stackoverflow.com/questions/2687981/loading-a-pyml-multiclass-classifier-why-isnt-this-working
Michael Aaron Safyan
A: 

Get a newer version of PyML. Since version 0.7.4, it is possible to save the OneAgainstRest classifier (with .save() and .load()); prior to that version, saving/loading the classifier is non-trivial and error-prone.

Michael Aaron Safyan