Listy Biometryczne - Biometrical Letters Vol. 38(2001), No. 1, 11-31


Show full-size cover
A SEMI-STOCHASTIC GRAND TOUR FOR IDENTIFYING OUTLIERS
AND FINDING A CLEAN SUBSET


Anna Bartkowiak

Institute of Computer Science, University of Wroclaw,
Przesmyckiego 20, 51-151 Wroclaw, Poland


The grand tour method has proved to be a very efficient method in detecting outliers. The present paper proposes further modifications of the grand tour algorithm by constructing robust concentration ellipses. It is also emphasized that the same method can be used for obtaining a "clean" data set. Such a subset may be the starting point for robust multivariate procedures. The method is simple, can be easily implemented on parallel computers, and as such may be used in data mining for large data sets. The considerations are illustrated with two benchmarks and one real medical data set.


multivariate outlier, graphical methods, grand tour, linked plots, ellipse of concentration.