Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
2012 | 49 | 2 | 135-147
Tytuł artykułu

A Global Approach to the Comparison of Clustering Results

Treść / Zawartość
Warianty tytułu
Języki publikacji
The discovery of knowledge in the case of Hierarchical Cluster Analysis (HCA) depends on many factors, such as the clustering algorithms applied and the strategies developed in the initial stage of Cluster Analysis. We present a global approach for evaluating the quality of clustering results and making a comparison among different clustering algorithms using the relevant information available (e.g. the stability, isolation and homogeneity of the clusters). In addition, we present a visual method to facilitate evaluation of the quality of the partitions, allowing identification of the similarities and differences between partitions, as well as the behaviour of the elements in the partitions. We illustrate our approach using a complex and heterogeneous dataset (real horse data) taken from the literature. We apply HCA based on the generalized affinity coefficient (similarity coefficient) to the case of complex data (symbolic data), combined with 26 (classic and probabilistic) clustering algorithms. Finally, we discuss the obtained results and the contribution of this approach to gaining better knowledge of the structure of data.
Opis fizyczny
  • University of Azores, Department of Mathematics, CMATI, 9501-855-Ponta Delgada, Portugal,
  • University of Lisbon, Faculty of Psychology, Laboratory of Statistics and Data Analysis 1649-013-Lisboa, Portugal, and DataScience,
  • New University of Lisbon, FCT, Department of Mathematics, 2829-516-Caparica, Portugal, and DataScience,
  • Bacelar-Nicolau H. (1980): Contributions to the Study of Comparison Coefficients in Cluster Analysis, PhD Th. (in Portuguese), Univ. Lisbon.
  • Bacelar-Nicolau H. (1988): Two Probabilistic Models for Classification of Variables in Frequency Tables. In: Classification and Related Methods of Data Analysis, H.-H. Bock (ed.), North Holland: Elsevier Sciences Publishers B.V.: 181-186.
  • Bacelar-Nicolau H. (2000): The Affinity Coefficient. In: Analysis of Symbolic Data Exploratory Methods for Extracting Statistical Information from Complex Data, H.H. Bock, E. Diday (Eds.), Springer: 160-165.
  • Bacelar-Nicolau H., Nicolau F.C., Sousa A., Bacelar-Nicolau L. (2009): Measuring Similarity of Complex and Heterogeneous Data in Clustering of Large Data Sets. Biocybernetics and Biomedical Engineering 29(2): 9-18.
  • Bacelar-Nicolau H., Nicolau F.C., Sousa A., Bacelar-Nicolau L. (2010): Clustering Complex Heterogeneous Data Using a Probabilistic Approach. Proceedings of Stochastic Modeling Techniques and Data Analysis International Conference (SMTDA2010), Chania Crete Greece, 8-11 June 2010 - published on the CD Proceedings of SMTDA2010 (electronic publication).
  • Carvalho F., Souza R. (2009): Unsupervised Pattern Recognition Models for Mixed Feature-Type Symbolic Data. Pattern Recognition Letters 31(5): 430-443.[WoS]
  • Gordon A.D. (1999): Classification, 2nd. Chapman &Hall, London.
  • Lerman I.C. (1981): Classification et Analyse Ordinale des Données. Dunod, Paris, 1981.
  • Nicolau F.C. (1983): Cluster Analysis and Distribution Function. Meth. Oper. Res. 45: 431-433.
  • Nicolau F.C., Bacelar-Nicolau H. (1998): Some Trends in the Classification of Variables. In: Data Science, Classification, and Related Methods, C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H. H. Bock, Y. Baba (Eds.), Springer-Verlag: 89-98.
  • Silva O., Bacelar-Nicolau H., Nicolau F.C. (2010): Global Approach for Evaluating the Quality of Clustering Results. In: Programme and Abstracts CFE 10 & ERCIM 10 (4th CSDA Intern. Conference on Computational and Financial Econometrics and 3rd Conference of the ERCIM Working Group on Computing and Statistics): 40.
  • Silva O. (2011): Contributions for Comparing and Evaluating Partitions in Hierarchical Cluster Analysis. PhD. Th. (in Portuguese), Azores University.
Typ dokumentu
Identyfikator YADDA
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.