Biometrical Letters Vol. 49(2), 2012, pp. 135-147


Show full-size cover
A GLOBAL APPROACH TO THE COMPARISON OF CLUSTERING RESULTS

Osvaldo Silva1, Helena Bacelar-Nicolau2, Fernando C. Nicolau3

1University of Azores, Department of Mathematics, CMATI, 9501-855-Ponta Delgada,
Portugal, osilva@uac.pt
2University of Lisbon, Faculty of Psychology, Laboratory of Statistics and Data Analysis
1649-013-Lisboa, Portugal, and DataScience, hbacelar@fp.ul.pt
33New University of Lisbon, FCT, Department of Mathematics, 2829-516-Caparica, Portugal,
and DataScience, geral@datascience.org


The discovery of knowledge in the case of Hierarchical Cluster Analysis (HCA) depends on many factors, such as the clustering algorithms applied and the strategies developed in the initial stage of Cluster Analysis. We present a global approach for evaluating the quality of clustering results and making a comparison among different clustering algorithms using the relevant information available (e.g. the stability, isolation and homogeneity of the clusters). In addition, we present a visual method to facilitate evaluation of the quality of the partitions, allowing identification of the similarities and differences between partitions, as well as the behaviour of the elements in the partitions. We illustrate our approach using a complex and heterogeneous dataset (real horse data) taken from the literature. We apply HCA based on the generalized affinity coefficient (similarity coefficient) to the case of complex data (symbolic data), combined with 26 (classic and probabilistic) clustering algorithms. Finally, we discuss the obtained results and the contribution of this approach to gaining better knowledge of the structure of data.


cluster analysis, VL methodology, affinity coefficient, comparing partitions, cluster stability and cluster validation