views:

66

answers:

3

Hi

I have a design question about the evolution of an object (and its state) after some sequence of methods complete. I'm having trouble articulating what I mean so I may need to clean up the question based on feedback.

Consider an object called Classifier. It has the following methods:

void initialise()
void populateTrainingSet(TrainingSet t)
void pupulateTestingSet(TestingSet t)
void train()
void test()
Result predict(Instance i)

My problem is that these methods need to be called in a certain order. Futher, some methods are invalid until a previous method is called, and some methods are invalid after a method has been called. For example, it would be invalid to call predict() before test() was called, and it would be invalid to call train() after test() was called.

My approach so far has been to maintain a private enum that represents the current stateof the object:

private static enum STATE{ NEW, TRAINED, TESTED, READY};

But this seems a bit cloogy. Is there a design pattern for such a problem type? Maybe something related to the template method.

+2  A: 

I think a state design pattern can help you. For each state, you create a class that implements those methods in your way, and methods you don't need, you can throw exceptions or don't do anything. And your main class can contain a state object which will change based on the state. Is it suitable for you? State Design pattern

vodkhang
+2  A: 

Yes, a Finite State Machine represents the states of the object and what an action will cause the object to do at a given state.

There are some very good examples on this Wikipedia article.

Longpoke
+2  A: 

Well, for this particular case, I think you are over-designing here. For example, should you really make a distinction between the training dataset and the testing dataset in terms of their type? My suggestion would be to go with a factory pattern; you should have a MachineLearningAlgorithm factory with a "train" function that returns a Hypothesis object on which you can perform "test" or "predict". The "train" function should take, as its parameter, the training data set, while the "test" function should take, as its parameter, the testing data set. Both data sets should probably be the same type, since their form/structure is identical, even though the data contained therein is different. As for populating the dataset, that really should not be the concern of your machine learning algorithm; whoever uses the algorithm should be responsible for providing those data sets. If you want to have some sort of example data sets, though, I would suggest factories for various different train/test data set pairs.

 public interface Result
 {
      public double getDecisionValue();
      public String getPredictedLabel();
 }

 public interface TestResult extends Result
 {
      public String getActualLabel();
 }

 public interface TestResults extends Iterable<TestResult>
 {
      public int getErrorCount();
      public double getErrorRate();
 }

 public interface Hypothesis
 {
       public TestResults test(Iterable<DataPoint> dataset, Iterable<String> labels); 
       public Result predict(DataPoint datapoint);
 }

 public interface MachineLearningAlgorithm
 {
         public Hypothesis train(Iterable<DataPoint> trainset, Iterable<String> trainlabels);
 }
Michael Aaron Safyan