dcmatrix {dcortools}R Documentation

Calculates distance covariance and distance correlation matrices

Description

Calculates distance covariance and distance correlation matrices

Usage

dcmatrix(
  X,
  Y = NULL,
  calc.dcov = TRUE,
  calc.dcor = TRUE,
  calc.cor = "none",
  calc.pvalue.cor = FALSE,
  return.data = TRUE,
  test = "none",
  adjustp = "none",
  b = 499,
  affine = FALSE,
  standardize = FALSE,
  bias.corr = FALSE,
  group.X = NULL,
  group.Y = NULL,
  metr.X = "euclidean",
  metr.Y = "euclidean",
  use = "all",
  algorithm = "auto",
  fc.discrete = FALSE,
  calc.dcor.pw = FALSE,
  calc.dcov.pw = FALSE,
  test.pw = "none",
  metr.pw.X = "euclidean",
  metr.pw.Y = "euclidean"
)

Arguments

X

A data.frame or matrix.

Y

Either NULL or a data.frame or a matrix with the same number of rows as X. If only X is provided, distance covariances/correlations are calculated between all groups in X. If X and Y are provided, distance covariances/correlations are calculated between all groups in X and all groups of Y.

calc.dcov

logical; specifies if the distance covariance matrix is calculated.

calc.dcor

logical; specifies if the distance correlation matrix is calculated.

calc.cor

If set as "pearson", "spearman" or "kendall", a corresponding correlation matrix is additionally calculated.

calc.pvalue.cor

logical; IF TRUE, a p-value based on the Pearson or Spearman correlation matrix is calculated (not implemented for calc.cor ="kendall") using Hmisc::rcorr.

return.data

logical; specifies if the dcmatrix object should contain the original data.

test

specifies the type of test that is performed, "permutation" performs a Monte Carlo Permutation test. "gamma" performs a test based on a gamma approximation of the test statistic under the null. "conservative" performs a conservative two-moment approximation. "bb3" performs a quite precise three-moment approximation and is recommended when computation time is not an issue.

adjustp

If setting this parameter to "holm", "hochberg", "hommel", "bonferroni", "BH", "BY" or "fdr", corresponding adjusted p-values are additionally returned for the distance covariance test.

b

specifies the number of random permutations used for the permutation test. Ignored for all other tests.

affine

logical; indicates if the affinely transformed distance covariance should be calculated or not.

standardize

specifies if data should be standardized dividing each component by its standard deviations. No effect when affine = TRUE.

bias.corr

logical; specifies if the bias corrected version of the sample distance covariance (Huo and Szekely 2016) should be calculated.

group.X

A vector, each entry specifying the group membership of the respective column in X. Each group is handled as one sample for calculating the distance covariance/correlation matrices. If NULL, every sample is handled as an individual group.

group.Y

A vector, each entry specifying the group membership of the respective column in Y. Each group is handled as one sample for calculating the distance covariance/correlation matrices. If NULL, every sample is handled as an individual group.

metr.X

Either a single metric or a list providing a metric for each group in X (see examples).

metr.Y

see metr.X.

use

"all" uses all observations, "complete.obs" excludes NAs, "pairwise.complete.obs" uses pairwise complete observations for each comparison.

algorithm

specifies the algorithm used for calculating the distance covariance.

"fast" uses an O(n log n) algorithm if the observations are one-dimensional and metr.X and metr.Y are either "euclidean" or "discrete", see also Huo and Szekely (2016).

"memsave" uses a memory saving version of the standard algorithm with computational complexity O(n^2) but requiring only O(n) memory.

"standard" uses the classical algorithm. User-specified metrics always use the classical algorithm.

"auto" chooses the best algorithm for the specific setting using a rule of thumb.

"memsave" is typically very inefficient for dcmatrix and should only be applied in exceptional cases.

fc.discrete

logical; If TRUE, "discrete" metric is applied automatically on samples of type "factor" or "character".

calc.dcor.pw

logical; If TRUE, a distance correlation matrix between the univariate observations/columns is additionally calculated. Not meaningful if group.X and group.Y are not specified.

calc.dcov.pw

logical; If TRUE, a distance covariance matrix between the univariate observations/columns is additionally calculated. Not meaningful if group.X and group.Y are not specified.

test.pw

specifies a test (see argument "test") that is performed between all single observations.

metr.pw.X

Either a single metric or a list providing a metric for each single observation/column in X (see metr.X).

metr.pw.Y

See metr.pw.Y.

Value

S3 object of class "dcmatrix" with the following components

name X, Y

description original data (if return.data = TRUE).

name dcov, dcor

distance covariance/correlation matrices between the groups specified in group.X/group.Y (if calc.dcov/calc.dcor = TRUE).

name corr

correlation matrix between the univariate observations/columns (if cal.cor is "pearson", "spearman" or "kendall").

name pvalue

matrix of p-values based on a corresponding distance covariance test based on the entries in dcov (if argument test is not "none").

name pvalue.adj

matrix of p-values adjusted for multiple comparisons using the method specified in argument adjustp.

name pvalue.cor

matrix of pvalues based on "pearson"/"spearman" correlation (if calc.cor is "pearson" or "spearman" and calc.pvalue.cor = TRUE).

name dcov.pw, dcor.pw

distance covariance/correlation matrices between the univariate observations (if calc.dcov.pw/calc.dcor.pw = TRUE.)

name pvalue.pw

matrix of p-values based on a corresponding distance covariance test based on the entries in dcov.pw (if argument test is not "none").

References

Berschneider G, Bottcher B (2018). “On complex Gaussian random fields, Gaussian quadratic forms and sample distance multivariance.” arXiv preprint arXiv:1808.07280.

Bottcher B, Keller-Ressel M, Schilling RL (2018). “Detecting independence of random vectors: generalized distance covariance and Gaussian covariance.” Modern Stochastics: Theory and Applications, 3, 353–383.

Dueck J, Edelmann D, Gneiting T, Richards D (2014). “The affinely invariant distance correlation.” Bernoulli, 20, 2305–2330.

Huang C, Huo X (2017). “A statistically and numerically efficient independence test based on random projections and distance covariance.” arXiv preprint arXiv:1701.06054.

Huo X, Szekely GJ (2016). “Fast computing for distance covariance.” Technometrics, 58(4), 435–447.

Lyons R (2013). “Distance covariance in metric spaces.” The Annals of Probability, 41, 3284–3305.

Sejdinovic D, Sriperumbudur B, Gretton A, Fukumizu K (2013). “Equivalence of distance-based and RKHS-based statistics in hypothesis testing.” The Annals of Statistics, 41, 2263–2291.

Szekely GJ, Rizzo ML, Bakirov NK (2007). “Measuring and testing dependence by correlation of distances.” The Annals of Statistics, 35, 2769–2794.

Szekely GJ, Rizzo ML (2009). “Brownian distance covariance.” The Annals of Applied Statistics, 3, 1236–1265.

Examples

X <- matrix(rnorm(1000), ncol = 10)

dcm <- dcmatrix(X, test="bb3",calc.cor = "pearson",
 calc.pvalue.cor = TRUE, adjustp = "BH") 
 
dcm <- dcmatrix(X, test="bb3",calc.cor = "pearson", 
 calc.pvalue.cor = TRUE, adjustp = "BH", 
 group.X = c(rep(1, 5), rep(2, 5)), 
 calc.dcor.pw = TRUE, test.pw = "bb3")


Y <- matrix(rnorm(600), ncol = 6)

Y[,6] <- rbinom(100, 4, 0.3)

dcm <- dcmatrix(X, Y, test="bb3",calc.cor = "pearson",
 calc.pvalue.cor = TRUE, adjustp = "BH")
  
dcm <- dcmatrix(X, Y, test="bb3",calc.cor = "pearson",
 calc.pvalue.cor = TRUE, adjustp = "BH",
 group.X = c(rep("group1", 5), rep("group2", 5)),
 group.Y = c(rep("group1", 5), "group2"), 
 metr.X = "gaussauto",
 metr.Y = list("group1" = "gaussauto", "group2" = "discrete"))

[Package dcortools version 0.1.6 Index]