gmcmtxBlk {generalCorr} | R Documentation |
Matrix R* of generalized correlation coefficients captures nonlinearities using blocks.
Description
The algorithm uses
two auxiliary functions, getSeq
and NLhat
. The latter
uses the
kern
function to kernel regress x on y, and conversely y on x. It
needs the package ‘np,’ which reports residuals and allows one to
compute fitted values (xhat, yhat). Unlike gmcmtx0
, this function
considers blocks of blksiz=10 (default) pairs of data points
separately with distinct bandwidths for each block, usually creating superior local fits.
Usage
gmcmtxBlk(mym, nam = colnames(mym), blksiz = 10)
Arguments
mym |
A matrix of data on selected variables arranged in columns |
nam |
Column names of the variables in the data matrix |
blksiz |
block size, default=10, if chosen blksiz >n, where n=rows in matrix then blksiz=n. That is, no blocking is done |
Details
This function does pairwise checks of missing data for all pairs. Assume that there are n rows in the input matrix ‘mym’ with some missing rows. If the columns of mym are denoted (X1, X2, ...Xp), we are considering all pairs (Xi, Xj), treated as (x, y), with ‘nv’ number of valid (non-missing) rows Note that each x and y is an (nv by 1) vector. This function further splits these (x, y) vectors into as many subgroups or blocks as are needed for the nv paired valid data points for the chosen block length (blksiz)
Next, the algorithm strings together various blocks of fitted value vectors (xhat, yhat) also of dimension nv by 1. Now for each pair of Xi Xj (column Xj= cause, row Xi=response, treated as x and y), the algorithm computes R*ij the simple Pearson correlation coefficient between (x, xhat) and as R*ji the correlation coeff. between (y, yhat). Next, it assigns |R*ij| and |R*ji| the observed sign of the Pearson correlation coefficient between x and y.
Its advantages discussed in Vinod (2015, 2019) are: (i)
It is asymmetric yielding causal direction information,
by relaxing the assumption of linearity implicit in usual correlation coefficients.
(ii) The R* correlation coefficients are generally larger upon admitting
arbitrary nonlinearities. (iii) max(|R*ij|, |R*ji|) measures (nonlinear) dependence.
For example, let x=1:20 and y=sin(x). This y has a perfect (100 percent)
nonlinear dependence on x and yet Pearson correlation coefficient r(x y)=
-0.0948372 is near zero, and its 95% confidence interval (-0.516, 0.363)
includes zero, implying that the population r(x,y) is not significantly
different from zero. This example highlights a serious
failure of the traditional r(x,y) in measuring dependence between x and y
when nonlinearities are present.
gmcmtx0
without blocking does work if x=1:n, and y=f(x)=sin(x) is used
with n<20. But for larger n, the fixed bandwidth used by the kern
function
becomes a problem. The block version has additional bandwidths for each block, and
hence it correctly quantifies the presence of high dependence even when
x=1:n, and y=f(x) are defined for large n and
complicated nonlinear functional forms for f(x).
Value
A non-symmetric R* matrix of generalized correlation coefficients
Author(s)
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
References
Vinod, H. D.'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in 'Handbook of Statistics: Computational Statistics with R', Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
Zheng, S., Shi, N.-Z., and Zhang, Z. (2012). 'Generalized measures of correlation for asymmetry, nonlinearity, and beyond,' Journal of the American Statistical Association, vol. 107, pp. 1239-1252.
Examples
## Not run:
x=1:20; y=sin(x)
gmcmtxBlk(cbind(x,y),blksiz=10)
## End(Not run)