Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms

Trawiński, Bogdan; Smętek, Magdalena; Telec, Zbigniew; Lasota, Tadeusz

doi:10.2478/v10006-012-0064-z

Artykuł - szczegóły

Czasopismo

International Journal of Applied Mathematics and Computer Science

2012 | 22 | 4 | 867-881

Tytuł artykułu

Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms

Autorzy

Bogdan Trawiński , Magdalena Smętek , Zbigniew Telec , Tadeusz Lasota

Treść / Zawartość

Pełne teksty:

http://matwbn.icm.edu.pl/ksiazki/amc/amc22/amc2247.pdf [zdalny]

Warianty tytułu

Języki publikacji

EN

Abstrakty

EN

In the paper we present some guidelines for the application of nonparametric statistical tests and post-hoc procedures devised to perform multiple comparisons of machine learning algorithms. We emphasize that it is necessary to distinguish between pairwise and multiple comparison tests. We show that the pairwise Wilcoxon test, when employed to multiple comparisons, will lead to overoptimistic conclusions. We carry out intensive normality examination employing ten different tests showing that the output of machine learning algorithms for regression problems does not satisfy normality requirements. We conduct experiments on nonparametric statistical tests and post-hoc procedures designed for multiple 1 × N and N × N comparisons with six different neural regression algorithms over 29 benchmark regression data sets. Our investigation proves the usefulness and strength of multiple comparison statistical procedures to analyse and select machine learning algorithms.

Słowa kluczowe

EN

machine learning nonparametric statistical tests statistical regression neural networks multiple comparison tests

Wydawca

University of Zielona Gora Press

Czasopismo

International Journal of Applied Mathematics and Computer Science

Rocznik

2012

Tom

22

Numer

4

Strony

867-881

Opis fizyczny

Daty

wydano

2012

otrzymano

2011-08-08

poprawiono

2012-04-04

Twórcy

autor

Bogdan Trawiński

Institute of Informatics, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland

autor

Magdalena Smętek

Institute of Informatics, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland

autor

Zbigniew Telec

Institute of Informatics, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland

autor

Tadeusz Lasota

Department of Spatial Management, Wrocław University of Environmental and Life Sciences, ul. Norwida 25/27, 50-375 Wrocław, Poland

Bibliografia

Alcalá-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., García, S., Sánchez, L. and Herrera, F. (2011). Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework, Journal of MultipleValued Logic and Soft Computing 17(2-3): 255-287.
Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M., Ventura, S., Garrell, J., Otero, J., Romero, C., Bacardit, J., Rivas, V., Fernández, J. and Herrera, F. (2009). KEEL: A software tool to assess evolutionary algorithms to data mining problems, Soft Computing 13(3): 307-318.
Anderson, T. and Darling, D. (1954). A test of goodness-of-fit, Journal of the American Statistical Association 49(268): 765-769.
Anscombe, F. and Glynn, W. (1983). Distribution of the kurtosis statistic b2 for normal samples, Biometrika 70(1): 227-234.
Baruque, B., Porras, S. and Corchado, E. (2011). Hybrid classification ensemble using topology-preserving clustering, New Generation Computing 29(3): 329-344.
Bergmann, G. and Hommel, G. (1988). Improvements of general multiple test procedures for redundant systems of hypotheses, in P. Bauer, G. Hommel and E. Sonnemann (Eds.), Multiple Hypotheses Testing, Springer-Verlag, Berlin, pp. 100-115.
Broomhead, D. and Lowe, D. (1998). Multivariable functional interpolation and adaptive networks, Complex Systems 11: 321-355.
Czarnowski, I. and Jędrzejowicz, P. (2011). Application of agent-based simulated annealing and tabu search procedures to solving the data reduction problem, International Journal of Applied Mathematics and Computer Science 21(1): 57-68, DOI: 10.2478/v10006-011-0004-3.
D'Agostino, R. (1970). Transformation to normality of the null distribution of g1, Biometrika 57(3): 679-681.
D'Agostino, R., Belanger, A. and D'Agostino Jr., R. (1990). A suggestion for using powerful and informative tests of normality, The American Statistician 44(4): 316-321.
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research 7: 1-30.
Derrac, J., García, S., Molina, D. and Herrera, F. (2011). A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms, Swarm and Evolutionary Computation 1: 3-18.
Dunn, O. (1961). Multiple comparisons among means, Journal of the American Statistical Association 56(238): 52-64.
Finner, H. (1993). On a monotonicity problem in step-down multiple test procedures, Journal of the American Statistical Association 88(423): 920-923.
Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance, Journal of the American Statistical Association 32(200): 675-701.
García, S., Fernández, A., Luengo, J. and Herrera, F. (2009). A study of statistical techniques and performance measures for genetics-based machine learning: Accuracy and interpretability, Soft Computing 10(13): 959-977.
García, S., Fernández, A. and Luengo, J.and Herrera, F. (2010). Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power, Information Sciences 180: 2044-2064.
García, S. and Herrera, F. (2008). An extension on “Statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons, Journal of Machine Learning Research 9: 2677-2694.
Graczyk, M., Lasota, T., Telec, Z. and Trawiński, B. (2010). Nonparametric statistical analysis of machine learning algorithms for regression problems, in R. Setchi, I. Jordanov, R.J. Howlett and L.C. Jain (Eds.), KES 2010, Lecture Notes in Artificial Intelligence, Vol. 6276, Springer, Heidelberg, pp. 111-120.
Graczyk, M., Lasota, T. and Trawiński, B. (2009). Comparative analysis of premises valuation models using KEEL, RapidMiner, and WEKA, in N.T. Nguyen, R. Kowalczyk and S.-M. Chen (Eds.), ICCCI 2009, Lecture Notes in Artificial Intelligence, Vol. 5796, Springer, Heidelberg, pp. 800-812.
Hill, T. and Lewicki, P. (2007). Statistics: Methods and Applications, StatSoft, Tulsa.
Hochberg, Y. (1988). A Sharper Bonferroni procedure for multiple tests of significance, Biometrika 75(4): 800-802.
Hodges, J. and Lehmann, E. (1962). Ranks methods for combination of independent experiments in analysis of variance, Annals of Mathematical Statistics 33: 482-497.
Holland, B. and Copenhaver, M. (1987). An improved sequentially rejective Bonferroni test procedure, Biometrics 43(2): 417-423.
Holm, S. (1979). A simple sequentially rejective multiple test procedure, Scandinavian Journal of Statistics 6: 65-70.
Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test, Biometrika 75(2): 383-386.
Hommel, G.and Bernhard, G. (1994). A rapid algorithm and a computer program for multiple test procedures using procedures using logical structures of hypotheses, Computer Methods and Programs in Biomedicine 43: 213-216.
Igel, C. and Hüsken, M. (2003). Empirical evaluation of the improved RPROP learning algorithm, Neurocomputing 50: 105-123.
Iman, R. and Davenport, J. (1980). Approximations of the critical region of the Friedman statistic, Communications in Statistics 18: 571-595.
Jackowski, K. and Woźniak, M. (2010). Method of classifier selection using the genetic approach, Expert Systems 27(2): 114-128.
Jarque, C. and Bera, A. (1987). A test for normality of observations and regression residuals, International Statistical Review 55(2): 163-172.
Kajdanowicz, T. and Kazienko, P. (2011). Boosting-based sequential output prediction, New Generation Computing 29(3): 293-307.
Keskin, S. (2006). Comparison of several univariate normality tests regarding type I error rate and power of the test in simulation based small samples, Journal of Applied Science Research 2(5): 296-300.
Król, D., Lasota, T., Trawiński, B. and Trawiński, K. (2008). Investigation of evolutionary optimization methods of TSK fuzzy model for real estate appraisal, International Journal of Hybrid Intelligent Systems 5(3): 111-128.
Krzystanek, M., Lasota, T. and Trawiński, B. (2009). Comparative analysis of evolutionary fuzzy models for premises valuation using KEEL, in N.T. Nguyen, R. Kowalczyk and S.-M. Chen (Eds.), ICCCI 2009, Lecture Notes in Artificial Intelligence, Vol. 5796, Springer, Heidelberg, pp. 838-849.
Lasota, T., Mazurkiewicz, J., Trawiński, B. and Trawiński, K. (2010). Comparison of data driven models for the validation of residential premises using KEEL, International Journal of Hybrid Intelligent Systems 7(1): 3-16.
Lasota, T., Telec, Z., Trawiński, B. and Trawiński, K. (2011). Investigation of the ets evolving fuzzy systems applied to real estate appraisal, Journal of Multiple-Valued Logic and Soft Computing 17(2-3): 229-253.
Li, J. (2008). A two-step rejection procedure for testing multiple hypotheses, Journal of Statistical Planning and Inference 138(6): 1521-1527.
Lilliefors, H. (1967). On the Kolmogorov-Smirnov test for normality with mean and variance unknown, Journal of the American Statistical Association 62(318): 399-402.
Luengo, J., García, S. and Herrera, F. (2009). A study on the use of statistical tests for experimentation with neural networks: Analysis of parametric test conditions and non-parametric tests, Expert Systems with Applications 36: 7798-7808.
Lughofer, E., Trawiński, B., Trawiński, K., Kempa, O. and Lasota, T. (2011). On employing fuzzy modeling algorithms for the valuation of residential premises, Information Sciences 181: 5123-5142.
Moller, F. (1990). A scaled conjugate gradient algorithm for fast supervised learning, Neural Networks 6: 525-533.
Motulsky, H. (2010). Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking, 2nd Edn., Oxford University Press, New York, NY.
Nemenyi, P.B. (1963). Distribution-free Multiple Comparisons, Ph.D. thesis, Princeton University, Princeton, NJ.
Plackett, R. (1983). Karl Pearson and the chi-squared test, International Statistical Review 51(1): 59-72.
Plat, J. (1991). A resource allocating network for function interpolation, Neural Computation 3(2): 213-225.
Quade, D. (1979). Using weighted rankings in the analysis of complete blocks with additive block effects, Journal of the American Statistical Association 74: 680-683.
Romão, X., Delgado, R. and Costa, A. (2010). An empirical power comparison of univariate goodness-of-fit tests for normality, Journal of Statistical Computation and Simulation 80(5): 545-591.
Rom, D. (1990). A sequentially rejective test procedure based on a modified Bonferroni inequality, Biometrika 77(3): 663-665.
Royston, P. (1993). A pocket-calculator algorithm for the Shapiro-Francia test for non-normality: An application to medicine, Statistics in Medicine 12(2): 181-184.
Salzberg, S. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach, Data Mining and Knowledge Discovery 1: 317-327.
Shaffer, J. (1986). Modified sequentially rejective multiple test procedures, Journal of the American Statistical Association 81(395): 826-831.
Shapiro, S. and Wilk, M. (1965). An analysis of variance test for normality (complete samples), Biometrika 52(3/4): 591-611.
Sheskin, D. (2011). Handbook of Parametric and Nonparametric Statistical Procedures, 5th Edn., Chapman & Hall/CRC, Boca Raton, FL.
Smętek, M. and Trawiński, B. (2011). Investigation of genetic algorithms with self-adaptive crossover, mutation, and selection, in E. Corchado, M. Kurzyński and M. Woźniak (Eds.), HAIS 2011, Lecture Notes in Artificial Intelligence, Vol. 6678, Springer, Heidelberg, pp. 116-123.
Smotroff, I., Friedman, D. and Connolly, D. (1991). Self organizing modular neural networks, IEEE International Joint Conference on Neural Networks, IJCNN'91, Seattle, WA, USA, pp. 187-192.
Székely, G.J. and Rizzo, M. (2005). A new test for multivariate normality, Journal of Multivariate Analysis 93(1): 58-80.
Tanweeer-Ul-Islam (2011). Normality testing-A new direction, International Journal of Business and Social Science 2(3): 115-118.
Thode, H. (2002). Testig for Normality, Marcel Dekker, New York, NY.
Troć, M. and Unold, O. (2010). Self-adaptation of parameters in a learning classifier system ensemble machine, International Journal of Applied Mathematics and Computer Science 20(1): 157-174, DOI: 10.2478/v10006-010-0012-8.
Wilcoxon, F. (1945). Individual comparisons by ranking methods, Biometrics 1: 80-83.
Wright, S. (1992). Adjusted p-values for simultaneous inference, Biometrics 48: 1005-1013.
Yazici, B. and Yolacan, S. (2007). A comparison of various tests of normality, Journal of Statistical Computation and Simulation 77(2): 175-183.
Zaman, M. and Hirose, H. (2011). Classification performance of bagging and boosting type ensemble methods with small training sets, New Generation Computing 29(3): 277-292.
Zar, J. (2009). Biostatistical Analysis, 5th Edn., Prentice Hall, Upper Saddle River, NJ.

Typ dokumentu

Bibliografia

Identyfikatory

DOI

10.2478/v10006-012-0064-z

Identyfikator YADDA

bwmeta1.element.bwnjournal-article-amcv22z4p867bwm

Artykuł - szczegóły

Czasopismo

International Journal of Applied Mathematics and Computer Science

Tytuł artykułu

Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms

Autorzy

Treść / Zawartość

Warianty tytułu

Języki publikacji

Abstrakty

Słowa kluczowe

Wydawca

Czasopismo

Rocznik

Tom

Numer

Strony

Opis fizyczny

Daty

Twórcy

Bibliografia

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA