A geometric view on Pearson’s correlation coefficient and a generalization of it to non-linear dependencies

Priyantha Wijayatunga

Abstract


Measuring strength or degree of statistical dependence between two random variables is a common problem in many domains. Pearson’s correlation coefficient ρ is an accurate measure of linear dependence. We show that ρ is a normalized, Euclidean type distance between joint probability distribution of the two random variables and that when their independence is assumed while keeping their marginal distributions. And the normalizing constant is the geometric mean of two maximal distances; each between the joint probability distribution when the full linear dependence is assumed while preserving respective marginal distribution and that when the independence is assumed. Usage of it is restricted to linear dependence because it is based on Euclidean type distances that are generally not metrics and considered full dependence is linear. Therefore, we argue that if a suitable distance metric is used while considering all possible maximal dependences then it can measure any non-linear dependence. But then, one must define all the full dependences. Hellinger distance that is a metric can be used as the distance measure between probability distributions and obtain a generalization of ρ for the discrete case.

Keywords


metric/distance; probability simplex; normalization

Full Text:

PDF

References


A. Rényi, Probability Theory North-Holland Publishing Company and AkadémiaiKiadó, PublishingHouseoftheHungarianAcademyofSciences. Republished Dover USA, 2007.

C. Sabatti, Measuring dependency with volume tests, The American Statistician 56 3 (2002), 191-195. DOI: 10.1198/000313002128.

C. W. Granger, E. Maasoumi and J. Racine, A Dependence Metric for Possibly Nonlinear Processes, The Journal of Time Series Analysis 25 5 (2004),649-669.

F. Berzal, I. Blanco, D. Sanchez and M. -A. Vila, Measuring the Accuracy and Interest of Association Rules: A New Framework, Intelligent Data Analysis 6 3 (2002), 221-235.

H. Skaug and D. Tjostheim, Testing for serial independence using measures of distance between densities, P. M. Robinson and M. Rosenblatt (Eds): Athens Conference on Applied Probability and Time Series, Volume II:

Time Series Analysis In Memory of E.J. Hannan, Springer Lecture Notes in Statistics 115 (1996), 363-377.

K. Matsusita, Decision rules, based on distance, for problems of fit, two samples, and estimation, Annals of Mathematical Statistics 26 4 (1955), 631-640.

M. Studeny and J. Vejnarova, The Multiinformation Function as a Tool for Measuring Stochastic Dependence, M. I. Jordan (Eds): Learning in Graphical Models, Kluwer Academic Publishers (1998), 261-297.

M. Sugiyama and K. M. Borgwardt, Measuring Statistical Dependence via the Mutual Information Dimension, Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI’13) AAAI Press (2013), 1692-1698.

N. Balakrishnan and C. -D. Lai, Continuous Bivariate Distributions, Springer, 2009.

P. Diaconis and B. Efron, Testing for independence in a two-way table: new interpretations of Chi-square statistics, The Annals of Statistics 13 (1985), 845-874.

P. Wijayatunga, S. Mase and M. Nakamura, Appraisal of Companies with Bayesian Networks, International Journal of Business Intelligence and Data Mining 1 3 (2006), 326-346.

S. E. Fienberg and J. P. Gilbert, The Geometry of a Two by Two Contingency Table, Journal of the American Statistical Association 65 (1970), 694-701

S. Kullback and R. A. Leibler, On information and sufficiency, The Annals of Mathematical Statistics 22 1 (1951), 79-86

W. Bergsma, A bias-correction for Cramér’s V and Tschuprow’s T, Journal of the Korean Statistical Society 42 3 (2013), 323-328. http://dx.doi.org/10.1016/j.jkss.2012.10.002.




DOI: http://dx.doi.org/10.23755/rm.v30i1.5

Refbacks

  • There are currently no refbacks.


Copyright (c) 2016 Priyantha Wijayatunga

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Ratio Mathematica - Journal of Mathematics, Statistics, and Applications. ISSN 1592-7415; e-ISSN 2282-8214.