This is a proposal for a MultiBoost summer of code. The objective is to port MultiBoost into Python while keeping its efficiency asset. The original PDF document can be downloaded from here.


AdaBoost (Freund and Schapire 1997) is one of the best off-the-shelf learning methods developed in the last fifteen years. It constructs a classifier in an incremental fashion by adding simple classifiers to a pool, and uses their weighted “vote” to determine the final classification. AdaBoost was later extended to multi-class classification problems (Schapire and Singer 1999). Although various other attempts have been made handle the multi-class setting, AdaBoost.MH remains the gold standard of multi-class and multi-label1 boosting due to its simplicity and versatility (Kégl 2014).


Despite the simplicity and the practical success of the AdaBoost, there are relatively few off-the-shelf implementations available in the free software market. Whereas binary AdaBoost with decision stumps is easy to code, multi-class AdaBoost.MH and complex base learners are not straightforward to implement efficiently.

In the landscape of the available implementations, MultiBoost (Benbouzid et al. 2012) offers a modular implementation of AdaBoost.MH, it is developed in C++ and it includes a various palette of weak learners as well as other strong learners. The modular architecture of MultiBoost showed a great advantage for quickly implementing new boosting algorithms or new weak learners without having to modify the rest of the code. In particular, the base learner decomposition proposed by Kégl (2014) and already implemented in MultiBoost allows to turn any binary classifier into a multi-class/multi-label classifier. In addition to its practical aspect, this setup also showed outstanding performance in comparison with other boosting algorithms and even with the standard implementation of AdaBoost.MH.

During the recent past years, the data science community largely adopted Python as a common programming language for machine learning and data analysis, which naturally led to the flourishing of rich and versatile machine learning toolboxes such as Scikit-learn (Pedregosa et al. 2011) and MDP2. Furthermore, data analysis often requires the combination of many learning algorithms into a workflow that tend to be automatized. In order to make MultiBoost part of this workflow, we would like MultiBoost to be “interfaceable” with the python software ecosystem, while keeping its efficiency asset.


Language and building blocks

MultiBoost.py will be mainly implemented in the Python, however, some critical parts (the hot spots) will need to be implemented in Cython, after careful code profiling.

In terms of dependencies, MultiBoost.py will rely on the standard data analysis tools in Python, ie. Numpy, Scipy and Pandas. Furthermore, all the classifiers (weak and strong) will follow the interfaces defined in Scikit-Learn so to make their usage totally accessible and compliant with the other tools provided by this library.

Library / Standalone

MultiBoost.py is meant to be an general boosting library as well as a turn-key, standalone software. As a library, it should allow the quick implementation of new algorithms. At the same time, the package will provide a command-line interface, allowing to read datasets in some specific formats.


The main asset of MultiBoost.py over other implementations of boosting algorithms lies in a multi-level modularity. The first level of modularity is inherent to ensemble methods as they decompose the ensemble learner (or strong learner in the case of boosting) from the base (or weak) learner. Furthermore, inspired by the C++ implementation of MultiBoost, a specific class of base learners allows to extend any binary classifier to multi-class classification. For a K-class problem, the base classifier is decomposed like so

{{\bf h}}(\cdot) = \alpha \times \phi(\cdot) \times {{\bf v}}

where \alpha is the base learner coefficient, \phi \in \{\pm 1\} is a binary classifier and {{\bf v}}\in \{\pm 1\}^K is a vote vector that is optimized separately.

This decomposition, which orthogonally learns a binary classifier, the \phi function, and sets the vales of the {{\bf v}} vector, greatly eases the implementation of new multi-class base learners, by only focusing on the \phi function.

As a third and last level of modularity, MultiBoost.py will also implement the AnyBoost framework (Mason et al. 2000) in order to easily derive boosting algorithms from proper loss functions.


Strong learners

  • AdaBoost.MH.

  • AnyBoost.

Base learners

  • Decision stump learner (for continuous features)

  • Indicator learner (for nominal features)

  • The Hamming Trees (MultiBoost) meta-algorithm.

  • Mixed-types features (continuous, nominal)

I/O modes

  • Input

    • Numpy arrays

    • Command-line arguments

    • Configuration file

  • Output

    • Learning curve (iteration-wise metrics)

    • Classification scores

    • Flexible output system for the easy implementation of new metrics to output.

Documentation and tests

The classes and functions will be cautiously documented with doctrings. The development will be fully test-driven.


Project starting

June 1, 2014. The first tasks would be to:

  • implement the AdaBoost outer loop in python,

  • port the Hamming trees of MultiBoost in python,

  • test the combination of the two aforementioned and compare the results with MultiBoost,

  • profile the code and potentially rewrite some portions in Cython

Benbouzid, D., R. Busa-Fekete, N. Casagrande, F.-D. Collin, and B. Kégl. 2012. “MultiBoost: a Multi-Purpose Boosting Package.” Journal of Machine Learning Research 13: 549–553.

Freund, Y., and R. E. Schapire. 1997. “A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting.” Journal of Computer and System Sciences 55: 119–139.

Kégl, B. 2014. “The Return of AdaBoost.MH: multi-Class Hamming Trees.” In International Conference on Learning Representations. http://arxiv.org/abs/1312.6086.

Mason, L., P. Bartlett, J. Baxter, and M. Frean. 2000. “Boosting Algorithms as Gradient Descent.” In Advances in Neural Information Processing Systems, 12:512–518. The MIT Press.

Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, et al. 2011. “Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning Research 12: 2825–2830.

Schapire, R. E., and Y. Singer. 1999. “Improved Boosting Algorithms Using Confidence-Rated Predictions.” Machine Learning 37 (3): 297–336.

  1. In the multi-label setting, one observation can belong to more than one class.

  2. http://mdp-toolkit.sourceforge.net/

Djalel Benbouzid,
Apr 29, 2014, 6:20 AM