Using the one-versus-rest strategy with samples balancing to improve pairwise coupling classification

W. Chmielnicki; K. Stąpor

Download PDF - Using the one-versus-rest strategy with samples balancing to improve pairwise coupling classification, Opens in new tab

ArticleOriginal scientific text

Title

Using the one-versus-rest strategy with samples balancing to improve pairwise coupling classification

Authors ¹, ²

Affiliations

Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, ul. prof. Stanisława Lojasiewicza 11, 30-348 Kraków, Poland
Institute of Computer Science, Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice, Poland

Abstract

The simplest classification task is to divide a set of objects into two classes, but most of the problems we find in real life applications are multi-class. There are many methods of decomposing such a task into a set of smaller classification problems involving two classes only. Among the methods, pairwise coupling proposed by Hastie and Tibshirani (1998) is one of the best known. Its principle is to separate each pair of classes ignoring the remaining ones. Then all objects are tested against these classifiers and a voting scheme is applied using pairwise class probability estimates in a joint probability estimate for all classes. A closer look at the pairwise strategy shows the problem which impacts the final result. Each binary classifier votes for each object even if it does not belong to one of the two classes which it is trained on. This problem is addressed in our strategy. We propose to use additional classifiers to select the objects which will be considered by the pairwise classifiers. A similar solution was proposed by Moreira and Mayoraz (1998), but they use classifiers which are biased according to imbalance in the number of samples representing classes.

Keywords

ENG

POL

multiclass classification, pairwise coupling, problem decomposition, support vector machine (SVM)

Allwein, E., Schapire, R. and Singer, Y. (2001). Reducing multiclass to binary: A unifying approach for margin classifiers, Journal of Machine Learning Research 1: 113–141.
Beyan, C. and Fisher, R. (2015). Classifying imbalanced data sets using similarity based hierarchical decomposition, Pattern Recognition 48(5): 1653–1672.
Breiman, L. (1996). Bagging predictors, Machine Learning 24(2): 123–140.
Cateni, S., Colla, V. and Vannucci, M. (2014). A method for resampling imbalanced datasets in binary classification tasks for real-world problems, Neurocomputing 135: 32–41.
Chang, C. and Lin, C. (2001). LIBSVM: A library for support vector machines, http://www.csie.ntu.edu.tw/jlin/libsvm.
Chawla, N., Bowyer, K., Hall, L. and Kegelmeyer, W.P. (2002). SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research 16: 321–357.
Chmielnicki,W., Roterman-Konieczna, I. and Stąpor, K. (2012). An improved protein fold recognition with support vector machines, Expert Systems 20(2): 200–211.
Chmielnicki, W. and Stąpor, K. (2010). Protein fold recognition with combined SVM-RDA classifier, in M.G. Romay and E. Corchado (Eds.), Hybrid Artificial Intelligence Systems, Lecture Notes in Artificial Intelligence, Vol. 6076, Springer, Berlin, pp. 162–169.
Chmielnicki, W. and Stąpor, K. (2012). A hybrid discriminative/generative approach to protein fold recognition, Neurocomputing 75(1): 194–198.
Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research 7: 1–30.
Dietterich, T. (1998). Approximate statistical tests for comparing supervised classification learning algorithms, Neural Computation 10: 1895–1924.
Dietterich, T.G. and Bakiri, G. (1995). Solving multiclass problems via error-correcting output codes, Journal of Artificial Intelligence Research 2: 263–286.
Ding, C. and Dubchak, I. (2001). Multi-class protein fold recognition using support vector machines and neural networks, Bioinformatics 17(4): 349–358.
Fei, B. and Liu, J. (2006). Binary tree of SVM: A new fast multiclass training and classification algorithm, IEEE Transactions on Neural Networks 17(3): 696–704.
Friedman, J. (1996). Another approach to polychotomous classification, Technical report, Stanford University, Stanford, CA.
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H. and Herrera, F. (2011). An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes, Pattern Recognition 44(8): 1761–1776.
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H. and Herrera, F. (2013). Dynamic classifier selection for one-vs-one strategy: Avoiding non-competent classifiers, Pattern Recognition 46(12): 3412–3424.
Glomb, P., Romaszewski, M., Opozda, S. and Sochan, A. (2011). Choosing and modeling hand gesture database for natural user interface, Proceedings of the 9th International Conference on Gesture and Sign Language in Human-Computer Interaction and Embodied Communication, Athens, Greece, pp. 24–35.
Hastie, T. and Tibshirani, R. (1998). Classification by pairwise coupling, The Annals of Statistics 26(1): 451–471.
He, H. and Garcia, E. (2009). Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering 21(9): 1263–1284.
Hollander, M. and Wolfe, D. (1973). Nonparametric Statistical Methods, John Wiley and Sons, New York, NY.
Iman, R. and Davenport, J. (1980). Approximations of the critical region of the Friedman statistics, Communications in Statistics—Theory and Methods 9(6): 571–595.
Kahsay, L., Schwenker, F. and Palm, G. (2005). Comparison of multiclass SVM decomposition schemes for visual object recognition, in W. Kropatsch et al. (Eds.), Pattern Recognition, Lecture Notes in Computer Science, Vol. 3663, Springer, Berlin, pp. 334–341.
Kijsirikul, B. and Ussivakul, N. (2002). Multiclass support vector machines using adaptive directed acyclic graph, Proceedings of the International Joint Conference on Neural Networks, Honolulu, HI, USA, pp. 980–985.
Krawczyk, B., Wozniak, M. and Cyganek, B. (2014). Clusterting-based ensembles for one-class classification, Information Sciences 264: 182–195.
Krzysko, M. and Wolynski,W. (2009). New variants of pairwise classification, European Journal of Operational Research 199(2): 512–519.
LeCun, Y., Cortes, C. and Burges, Ch.J.C. (2014). The MNIST database of handwritten digits, http://yann.lecun.com/ exdb/mnist/.
Liu, C. and Fujisava, H. (2005). Classification and learning for character recognition: Comparison of methods and remaining problems, International Workshop on Neural Networks and Learning in Document Analysis and Recognition, Seoul, Korea, pp. 1–7.
Liu, X., Wu, J. and Zhou, Z.H. (2008). Exploratory undersampling for class-imbalance learning, IEEE Transactions on Systems, Man and Cybernetics B 39(2): 539–550.
Lorena, A. and Carvalho, A. (2010). Building binary-tree-based multiclass classifiers using separability measures, Neurocomputing 73(16–18): 2837–2845.
Lorena, A., Carvalho, A. and Gama, J. (2008). A review on the combination of binary classifiers in multiclass problems, Artificial Intelligence Review 30(1–4): 19–37.
Moreira, M. and Mayoraz, E. (1998). Improved pairwise coupling classification with correcting classifiers, Proceedings of the 10th European Conference on Machine Learning, ECML 1998, Chemnitz, Germany, pp. 160–171.
Nadeau, C. and Bengio, Y. (2003). Inference for the generalization error, Advances in Neural Information Processing Systems 52(3): 239–281.
Nemenyi, P. (1963). Distribution-free Multiple Comparisons, Ph.D. thesis, Princeton University, Princeton, NJ.
Ou, G. and Murphey, Y. (2006). Multi-class pattern classification using neural networks, Pattern Recognition 40(1): 4–18.
Platt, J., Cristianini, N. and Shawe-Taylor, J. (2000). Large margin DAGs for multiclass classification, Neural Information Processing Systems, NIPS’99, Breckenridge, CO, USA, pp. 547–553.
Saez, J.A., Galar, M., Luengo, J. and Herrera, F. (2012). A first study on decomposition strategies with data with class noise using decision trees, Proceedings of the 7th International Conference on Hybrid Artificial Intelligent Systems, Salamanca, Spain, Part II, pp. 25–35.
UCIMLR (2014). UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/datasets.html.
Vapnik, V. (1995). The Nature of Statistical Learning Theory, Springer, New York, NY.
Vural, V. and Dy, J. (2004). A hierarchical method for multi-class support vector machines, Proceedings of the 21st International Conference on Machine Learning, St. Louis, MO, USA, pp. 831–838.
Wilcoxon, F. (1945). Individual comparisons by ranking methods, Biometrics 1(6): 80–83.

Title

Using the one-versus-rest strategy with samples balancing to improve pairwise coupling classification

Affiliations

Abstract

Keywords

Bibliography