dcsis {dcortools} | R Documentation |
Performs distance correlation sure independence screening (Li et al. 2012) with some additional options (such as calculating corresponding tests).
Description
Performs distance correlation sure independence screening (Li et al. 2012) with some additional options (such as calculating corresponding tests).
Usage
dcsis(
X,
Y,
k = floor(nrow(X)/log(nrow(X))),
threshold = NULL,
calc.cor = "spearman",
calc.pvalue.cor = FALSE,
return.data = FALSE,
test = "none",
adjustp = "none",
b = 499,
bias.corr = FALSE,
use = "all",
algorithm = "auto"
)
Arguments
X |
A dataframe or matrix. |
Y |
A vector-valued response having the same length as the number of rows of X. |
k |
Number of variables that are selected (only used when threshold is not provided). |
threshold |
If provided, variables with a distance correlation larger than threshold are selected. |
calc.cor |
If set as "pearson", "spearman" or "kendall", a corresponding correlation matrix is additionally calculated. |
calc.pvalue.cor |
logical; IF TRUE, a p-value based on the Pearson or Spearman correlation matrix is calculated (not implemented for calc.cor = "kendall") using Hmisc::rcorr. |
return.data |
logical; specifies if the dcmatrix object should contain the original data. |
test |
Allows for additionally calculating a test based on distance Covariance. Specifies the type of test that is performed, "permutation" performs a Monte Carlo Permutation test. "gamma" performs a test based on a gamma approximation of the test statistic under the null. "conservative" performs a conservative two-moment approximation. "bb3" performs a quite precise three-moment approximation and is recommended when computation time is not an issue. |
adjustp |
If setting this parameter to "holm", "hochberg", "hommel", "bonferroni", "BH", "BY" or "fdr", corresponding adjusted p-values are additionally returned for the distance covariance test. |
b |
specifies the number of random permutations used for the permutation test. Ignored for all other tests. |
bias.corr |
logical; specifies if the bias corrected version of the sample distance covariance (Huo and Szekely 2016) should be calculated. |
use |
"all" uses all observations, "complete.obs" excludes NAs, "pairwise.complete.obs" uses pairwise complete observations for each comparison. |
algorithm |
specifies the algorithm used for calculating the distance covariance. "fast" uses an O(n log n) algorithm if the observations are one-dimensional and metr.X and metr.Y are either "euclidean" or "discrete", see also Huo and Szekely (2016). "memsave" uses a memory saving version of the standard algorithm with computational complexity O(n^2) but requiring only O(n) memory. "standard" uses the classical algorithm. User-specified metrics always use the classical algorithm. "auto" chooses the best algorithm for the specific setting using a rule of thumb. "memsave" is typically very inefficient for dcsis and should only be applied in exceptional cases. |
Value
dcmatrix object with the following two additional slots:
name selected |
description indices of selected variables. |
name dcor.selected |
distance correlation of the selected variables and the response Y. |
References
Berschneider G, Bottcher B (2018). “On complex Gaussian random fields, Gaussian quadratic forms and sample distance multivariance.” arXiv preprint arXiv:1808.07280. Dueck J, Edelmann D, Gneiting T, Richards D (2014). “The affinely invariant distance correlation.” Bernoulli, 20, 2305–2330.
Huang C, Huo X (2017). “A statistically and numerically efficient independence test based on random projections and distance covariance.” arXiv preprint arXiv:1701.06054.
Huo X, Szekely GJ (2016). “Fast computing for distance covariance.” Technometrics, 58(4), 435–447.
Li R, Zhong W, Zhu L (2012). “Feature screening via distance correlation learning.” Journal of the American Statistical Association, 107(499), 1129–1139.
Szekely GJ, Rizzo ML, Bakirov NK (2007). “Measuring and testing dependence by correlation of distances.” The Annals of Statistics, 35, 2769–2794.
Szekely GJ, Rizzo ML (2009). “Brownian distance covariance.” The Annals of Applied Statistics, 3, 1236–1265.
Examples
X <- matrix(rnorm(1e5), ncol = 1000)
Y <- sapply(1:100, function(u) sum(X[u, 1:50])) + rnorm(100)
a <- dcsis(X, Y)