views:

60

answers:

3

I am preparing a task for computer vision class, which involves training a simple classifier after extracting features from images. Since machine learning is not the main topic here, I don't want students to implement a learning algirithm from scratch. So, I have to recommend them some reference implementations. I believe the decision tree classifier is suitable for that.

The problem is the variety of languages allowed for the class is quite large: C++, C#, Delphi. Also, I don't want students to spend a lot of time to any technical issues like linking a library. WEKA is great for Java. We also can use OpenCV with all the wrappers, but it is quite big and clumsy while I want something simple and sweet.

So, do you know any simple C++/C#/Delphi libraries for learning decision trees?

+1  A: 

I know of such libraries, only one of which i have used recently. The two are Waffles and the Tilburg-Based Memory Learner (TiMBL). Both are free and open-source (lgpl and GNU gpl, respectively). In addition, both are stable, mature libraries. Waffles was created and is currently maintained by a single developer, while TiMBL i believe is an academic project (directed at the field of Linguistics).

Of these two, i have only used the decision tree module in Waffles (in class GDecisionTree, see the documentation here) Waffles might be the library of choice here because it includes a decent set of functions for descriptive statistics as well as plotting functions for diagnostics, to visualize the solution space, and whatnot. The Library author (Mike Gashler) also included a set of demo apps, though i don't recall if one of these apps is a decision tree.

I have used several of the classes in the Waffles Library (including the decision tree class) and i can certainly recommend it. I'm unable to say anything more about the Tilburg-Based Memory Learner because i have never used its decision tree class though.

doug
we finally recommend Waffles and WEKA.
overrider
+1  A: 

Programming language is not a problem. It is very hard to find a decision tree implementation for each language. Nearly impossible to guarantee that all the versions are the same implementation.

Since decision tree is a black box method. You can write the training and testing data into standard files(e.g. arff format in Weka, opencv also has its own format.) and use command line to call the tree learner and tester. In this way, all the students have the same decision tree. Otherwise, student A uses a good tree learner, student B uses a bad tree learner, when their results are different, you don't know whether it comes from the difference of decision tree or the CV part (e.g. feature processing). In this situation, you will go into the situation where you have to care about the details/implementation quality of the tree learners.

Yin Zhu
Of course your point makes sense. But it would be simpler to check the submissions if there will be just one application user works with (the class is taken by 300 students). Also, we want them to think of their program as some end-product, not just piece of research code. :) On the other hand, minor differences in performance are not critical: we don't rank them according to that.
overrider
+1  A: 

Have you looked at the "decision forest" implementation in Alglib? It's free for academic use. The webpage claims support for C++/C# and (maybe) Delphi. It's not a decision tree implementation but random forests tend to be better classifiers than single decision trees on many problems and they don't take much longer to train. My guess is that it will be hard to find a consistent decision tree implementation across multiple languages because there are so many different types of decision tree algorithms.

There are a number of other open source random forest libraries listed in the Wikipedia article if the Alglib one is not what you need. Cavaet: the Alglib implementation claims not to be a traditional random forest.

qdjm
Thank you for the hint! We recommend students to use decision trees first, since they provide sufficient quality on the basic dataset and are conceptually simple, i.e. one could look at the structure for debug or methodological purposes. However, we suggest students to implement random forest in the advanced part of the task.
overrider