R: Conditional Independence Test using the GHCM

ghcm_test {ghcm}

R Documentation

Conditional Independence Test using the GHCM

Description

Test whether X is independent of Y given Z using the Generalised Hilbertian Covariance Measure. The function is applied to residuals from regressing each of X and Y on Z respectively. Its validity is contingent on the performance of the regression methods. For a more in-depth explanation see the package vignette or the paper mentioned in the references.

Usage

ghcm_test(
  resid_X_on_Z,
  resid_Y_on_Z,
  X_limits = NULL,
  Y_limits = NULL,
  alpha = 0.05
)

Arguments

resid_X_on_Z, resid_Y_on_Z

Residuals from regressing X (Y) on Z with a suitable regression method. If X (Y) is uni- or multivariate or functional on a constant, fixed grid, the residuals should be supplied as a vector or matrix with no missing values. If instead X (Y) is functional and observed on varying grids or with missing values, the residuals should be supplied as a "melted" data frame with

.obs: Integer indicating which curve the row corresponds to.
.index: Function argument that the curve is evaluated at.
.value: Value of the function.

Note that in the irregular case, a minimum of 4 observations per curve is required.

X_limits, Y_limits

The minimum and maximum values of the function argument of the X (Y) curves. Ignored if X (Y) is not functional.

alpha

Numeric in the unit interval. Significance level of the test.

Value

An object of class ghcm containing:

test_statistic: Numeric, test statistic of the test.
p: Numeric in the unit interval, estimated p-value of the test.
alpha: Numeric in the unit interval, significance level of the test.
reject: TRUE if p < alpha, FALSE otherwise.

References

Please cite the following paper: Anton Rask Lundborg, Rajen D. Shah and Jonas Peters: "Conditional Independence Testing in Hilbert Spaces with Applications to Functional Data Analysis" Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2022 doi:10.1111/rssb.12544.

Examples

if (require(refund)) {
  set.seed(1)
  data(ghcm_sim_data)
  grid <- seq(0, 1, length.out = 101)

# Test independence of two scalars given a functional variable

  m_1 <- pfr(Y_1 ~ lf(Z), data=ghcm_sim_data)
  m_2 <- pfr(Y_2 ~ lf(Z), data=ghcm_sim_data)
  ghcm_test(resid(m_1), resid(m_2))

# Test independence of a regularly observed functional variable and a
# scalar variable given a functional variable
  
    m_X <- pffr(X ~ ff(Z), data=ghcm_sim_data, chunk.size=31000)
    ghcm_test(resid(m_X), resid(m_1))
  
# Test independence of two regularly observed functional variables given
# a functional variable
  
     m_W <- pffr(W ~ ff(Z), data=ghcm_sim_data, chunk.size=31000)
    ghcm_test(resid(m_X), resid(m_W))
  


  data(ghcm_sim_data_irregular)
  n <- length(ghcm_sim_data_irregular$Y_1)
  Z_df <- data.frame(.obs=1:n)
  Z_df$Z <- ghcm_sim_data_irregular$Z
# Test independence of an irregularly observed functional variable and a
# scalar variable given a functional variable
  
    m_1 <- pfr(Y_1 ~ lf(Z), data=ghcm_sim_data_irregular)
    m_X <- pffr(X ~ ff(Z), ydata = ghcm_sim_data_irregular$X,
    data=Z_df, chunk.size=31000)
    ghcm_test(resid(m_X), resid(m_1), X_limits=c(0, 1))
 
# Test independence of two irregularly observed functional variables given
# a functional variable
  
    m_W <- pffr(W ~ ff(Z), ydata = ghcm_sim_data_irregular$W,
    data=Z_df, chunk.size=31000)
    ghcm_test(resid(m_X), resid(m_W), X_limits=c(0, 1), Y_limits=c(0, 1))
 
}

[Package ghcm version 3.0.1 Index]