Real life data sets often suffer from missing data. The neuro-rough-fuzzy systems proposed hitherto often cannot handle such situations. The paper presents a neuro-fuzzy system for data sets with missing values. The proposed solution is a complete neuro-fuzzy system. The system creates a rough fuzzy model from presented data (both full and with missing values) and is able to elaborate the answer for full and missing data examples. The paper also describes the dedicated clustering algorithm. The paper is accompanied by results of numerical experiments.
Institute of Informatics, Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice, Poland
Bibliografia
Acuña, E. and Rodriguez, C. (2004). The treatment of missing values and its effect in the classifier accuracy, in D. Banks, L. House, F. McMorris, P. Arabie and W. Gaul (Eds.), Classification, Clustering and Data Mining Applications, Springer, Berlin/Heidelberg, pp. 639-648.
Box, G.E.P. and Jenkins, G. (1970). Time Series Analysis, Forecasting and Control, Holden-Day, Oakland, CA.
Chan, L.S., Gilman, J.A. and Dunn, O.J. (1976). Alternative approaches to missing values in discriminant analysis, Journal of the American Statistical Association 71(356): 842-844.
Cooke, M., Green, P., Josifovski, L. and Vizinho, A. (2001). Robust automatic speech recognition with missing and unreliable acoustic data, Speech Communication 34: 267-285.
Czogała, E. and Łęski, J. (2000). Fuzzy and Neuro-Fuzzy Intelligent Systems, Series in Fuzziness and Soft Computing, Physica-Verlag, Heidelberg/New York, NY.
Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B 39(1): 1-38.
Dubois, D. and Prade, H. (1990). Rough fuzzy sets and fuzzy rough sets, International Journal of General Systems 17(2): 191-209.
Dunn, J. C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact, well separated clusters, Journal Cybernetics 3(3): 32-57.
Farhangfar, A., Kurgan, L. and Dy, J. (2008). Impact of imputation of missing values on classification error for discrete data, Pattern Recognition 41(12): 3692-3705.
Farhangfar, A., Kurgan, L. and Pedrycz, W. (2007). A novel framework for imputation of missing values in databases, IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 37(5): 692-709.
Fuller, W.A. and Kim, J.K. (2005). Hot deck imputation for the response model, Survey Methodology 31(2): 139-149.
Ghahramani, Z. and Jordan, M. (1995). Learning from incomplete data, Technical report, Lab Memo No. 1509, CBCL Paper No. 108, MIT AI Lab, Cambridge, MA.
Grzymala-Busse, J. (2006). A rough set approach to data with missing attribute values, in G. Wang, J. Peters, A. Skowron and Y. Yao (Eds.), Rough Sets and Knowledge Technology, Lecture Notes in Computer Science, Vol. 4062, Springer, Berlin/Heidelberg, pp. 58-67.
Grzymala-Busse, J.W. and Hu, M. (2001). A comparison of several approaches to missing attribute values in data mining, in W. Ziarko and Y. Yao (Eds.), Rough Sets and Current Trends in Computing, Lecture Notes in Computer Science, Vol. 2005, Springer, Berlin/Heidelberg, pp. 378-385.
Hathaway, R. and Bezdek, J. (2001). Fuzzy c-means clustering of incomplete data, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 31(5): 735-744.
Himmelspach, L. and Conrad, S. (2010). Fuzzy clustering of incomplete data based on cluster dispersion, in E. Hüllermeier, R. Kruse and F. Hoffmann (Eds.), Computational Intelligence for Knowledge-Based Systems Design, 13th International Conference on Information Processing and Management of Uncertainty, IPMU 2010, Dortmund, Germany, June 28-July 2, 2010. Proceedings, Lecture Notes in Computer Science, Vol. 6178, Springer, Berlin/Heidelberg, pp. 59-68.
Hwang, C. and Rhee, F.C.-H. (2004). An interval type-2 fuzzy c spherical shells algorithm, Proceedings of the 2004 IEEE International Conference on Fuzzy Systems, Budapest, Hungary, pp. 1117-1122.
Korytkowski, M., Nowicki, R., Scherer, R. and Rutkowski, L. (2008). Ensemble of rough-neuro-fuzzy systems for classification with missing features, IEEE International Conference on Fuzzy Systems, FUZZ-IEEE (IEEE World Congress on Computational Intelligence), Hong Kong, China, pp. 1745-1750.
Lakshminarayan, K., Harp, S.A. and Samad, T. (1999). Imputation of missing data in industrial databases, Applied Intelligence 11(3): 259-275, DOI: 10.1023/A:1008334909089.
Łęski, J. (2008). Neuro-Fuzzy Systems, Wydawnictwa Naukowo-Techniczne, Warsaw, (in Polish).
Łęski, J. and Czogała, E. (1999). A new artificial neural network based fuzzy inference system with moving consequents in if-then rules and selected applications, Fuzzy Sets and Systems 108(3): 289-297.
Mamdani, E.H. and Assilian, S. (1975). An experiment in linguistic synthesis with a fuzzy logic controller, International Journal of Man-Machine Studies 7(1): 1-13.
Nowicki, R. (2006). Rough-neuro-fuzzy system with MICOG defuzzification, 2006 IEEE International Conference on Fuzzy Systems, Vancouver, Canada, pp. 1958-1965.
Nowicki, R. (2008). On combining neuro-fuzzy architectures with the rough set theory to solve classification problems with incomplete data, IEEE Transactions on Knowledge and Data Engineering 20(9): 1239-1253.
Nowicki, R.K. (2009). Rough-neuro-fuzzy structures for classification with missing data, IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics 39(6): 1334-1347.
Nowicki, R.K. (2010). On classification with missing data using rough-neuro-fuzzy systems, International Journal of Applied Mathematics and Computer Science 20(1): 55-67, DOI: 10.2478/v10006-010-0004-8.
Pawlak, Z. (1982). Rough sets, International Journal of Parallel Programming 11(5): 341-356.
Pedrycz, W. (1998). Conditional fuzzy clustering in the design of radial basis function neural networks, IEEE Transactions on Neural Networks 9(4): 601-612.
Renz, C., Rajapakse, J.C., Razvi, K. and Liang, S.K.C. (2002). Ovarian cancer classification with missing data, Proceedings of the 9th International Conference on Neural Information Processing, ICONIP'02, Singapore, Vol. 2, pp. 809-813.
Rubin, D. (1987). Multiple Imputation for Nonresponse in Surveys, John Wiley & Sons, New York, NY.
Sugeno, M. and Kang, G.T. (1988). Structure identification of fuzzy model, Fuzzy Sets and Systems 28(1): 15-33.
Takagi, T. and Sugeno, M. (1985). Fuzzy identification of systems and its application to modeling and control, IEEE Transactions on Systems, Man and Cybernetics 15(1): 116-132.
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. and Altman, R.B. (2001). Missing value estimation methods for DNA microarrays, Bioinformatics 17(6): 520-525.
Wagstaff, K. (2004). Clustering with missing values: No imputation required, in D. Banks, L. House, F.R. Mcmorris, P. Arabie and W. Gaul (Eds.), Classification, Clustering, and Data Mining Applications (Proceedings of the Meeting of the International Federation of Classification Societies), Springer, Berlin/Heidelberg, pp. 649-658.
Wagstaff, K.L. and Laidler, V.G. (2005). Making the most of missing values: Object clustering with partial data in astronomy, Proceedings of Astronomical Data Analysis Software and Systems XIV, Pasadena, CA, Vol. 347, pp. 172-176.
Yeh, I. C. (1998). Modeling of strength of high-performance concrete using artificial neural networks, Cement and Concrete Research 28(12): 1797-1808.
Zhang, C., Zhu, X., Zhang, J., Qin, Y. and Zhang, S. (2007). GBKII: An imputation method for missing values, Advances in Knowledge Discovery and Data Mining 4426: 1080-1087.
Zhang, S. (2011). Shell-neighbor method and its application in missing data imputation, Applied Intelligence 35(1): 1-11, DOI: 10.1007/s10489-009-0207-6.