Biometrical Letters Vol. 55(2), 2018, pp. 233-243


Show full-size cover
ENTROPY AS A MEASURE OF DEPENDENCY FOR CATEGORIZED DATA

Ewa Skotarczak, Anita Dobek, Krzysztof Moliński

Department of Mathematical and Statistical Methods, Poznań University of Life Sciences,
Wojska Polskiego 28, Poznań, Poland, efalsa@up.poznan.pl


Data arranged in a two-way contingency table can be obtained as a result of many experiments in the life sciences. In some cases the categorized trait is in fact conditioned by an unobservable continuous variable, called liability. It may be interesting to know the relationship between the Pearson correlation coefficient of these two continuous variables and the entropy function measuring the corresponding relation for categorized data. After many simulation trials, a linear regression was estimated between the Pearson correlation coefficient and the normalized mutual information (both on a logarithmic scale). It was observed that the regression coefficients obtained do not depend either on the number of observations classified on a categorical scale or on the continuous random distribution used for the latent variable, but they are influenced by the number of columns in the contingency table. In this paper a known measure of dependency for such data, based on the entropy concept, is applied.


contingency table, correlation, entropy, liability