Rough set-based dimensionality reduction for supervised and unsupervised learning
The curse of dimensionality is a damning factor for numerous potentially powerful machine learning techniques. Widely approved and otherwise elegant methodologies used for a number of different tasks ranging from classification to function approximation exhibit relatively high computational complexity with respect to dimensionality. This limits severely the applicability of such techniques to real world problems. Rough set theory is a formal methodology that can be employed to reduce the dimensionality of datasets as a preprocessing step to training a learning system on the data. This paper investigates the utility of the Rough Set Attribute Reduction (RSAR) technique to both supervised and unsupervised learning in an effort to probe RSAR's generality. FuREAP, a Fuzzy-Rough Estimator of Algae Populations, which is an existing integration of RSAR and a fuzzy Rule Induction Algorithm (RIA), is used as an example of a supervised learning system with dimensionality reduction capabilities. A similar framework integrating the Multivariate Adaptive Regression Splines (MARS) approach and RSAR is taken to represent unsupervised learning systems. The paper describes the three techniques in question, discusses how RSAR can be employed with a supervised or an unsupervised system, and uses experimental results to draw conclusions on the relative success of the two integration efforts.
- Bartels R., Beatty J. and Barsky B. (1987): Splines for Use in Computer Graphics and Geometric Modeling. - Los Altos: Morgan Kaufmann.
- Chouchoulas A. and Shen Q. (1998): Rough set-aided rule induction for plant monitoring. - Proc. Int. Joint Conf. Information Science (JCIS'98), Research Triangle Park, NC, Vol.2, pp.316-319.
- ERUDIT, European Network for Fuzzy Logic and Uncertainty Modeling in Information Technology. Protecting Rivers and Streams by Monitoring Chemical Concentrations and Algae Communities (Third International Competition) http://www.erudit.de/erudit/activities/ic-99/problem.htm
- Foley J.D., van Dam A., Feiner S.K., Hughes J.F. and Philips R.L. (1990): Introduction to Computer Graphics. - Reading: Addison-Wesley.
- Friedman J.H. (1991): Multivariate adaptive regression splines. - Annals of Statistics, Vol.19, No.1, pp1-67.
- Haykin S. (1994): Neural Networks. - New York: Macmillan College Publ. Comp.
- Jelonek J., Krawiec K. and Slowinski R. (1995): Rough set reduction of attributes and their domains for neural networks. - Comput. Intell., Vol.11, No.2, pp.339-347.
- Lozowski A., Cholewo T.J. and Zurada J.M. (1996): Crisp rule extraction from perceptron network classifiers. - Proc. Int. Conf. Neural Networks, Washington, volume of plenary, panel and special sessions, pp.94-99.
- Mitchell T.M. (1997): Machine Learning. - New York: McGraw-Hill.
- Pawlak Z. (1991): Rough Sets: Theoretical Aspects of Reasoning About Data. - Dordrecht: Kluwer.
- Quinlan J.R. (1993): C4.5: Programs for Machine Learning. - San Mateo: Morgan Kaufmann.
- van Rijsbergen C.J. (1979): Information Retrieval. - London: Butterworths.
- Ripley B.D. (1996): Pattern Recognition and Neural Networks. - Cambridge: Cambridge University Press.
- Shen Q. and Chouchoulas A. (1999): Data-driven fuzzy rule induction and its application to systems monitoring. - Proc. 8-th IEEE Int. Conf. Fuzzy Systems, Seoul, Korea, Vol.2, pp.928-933.
- Shen Q. and Chouchoulas A. (2000): A modular approach to generating fuzzy rules with reduced attributes for the monitoring of complex systems. - Eng. Appl. Artif. Intell., Vol.13, No.3, pp.263-278.
- Zadeh L. (1975): The concept of a linguistic variable and its application to approximate reasoning - I. - Inform. Sci., Vol.8, No.1, pp.199-249.