bcorsis {Ball} | R Documentation |
Ball Correlation based Sure Independence Screening (BCor-SIS)
Description
Generic non-parametric sure independence screening (SIS) procedure based on Ball Correlation. Ball correlation is a generic measure of dependence in Banach spaces.
Usage
bcorsis(
x,
y,
d = "small",
weight = c("constant", "probability", "chisquare"),
method = "standard",
distance = FALSE,
category = FALSE,
parms = list(d1 = 5, d2 = 5, df = 3),
num.threads = 0
)
Arguments
x |
a numeric matrix or data.frame included |
y |
a numeric vector, matrix, or data.frame. |
d |
the hard cutoff rule suggests selecting |
weight |
a logical or character string used to choose the weight form of Ball Covariance statistic..
If input is a character string, it must be one of |
method |
specific method for the BCor-SIS procedure. It must be one of |
distance |
if |
category |
a logical value or integer vector indicating columns to be selected as categorical variables.
If |
parms |
parameters list only available when |
num.threads |
number of threads. If |
Details
bcorsis
performs a model-free generic sure independence screening procedure,
BCor-SIS, to pick out variables from x
which are potentially associated with y
.
BCor-SIS relies on Ball correlation, a universal dependence measure in Banach spaces.
Ball correlation (BCor) ranges from 0 to 1. A larger BCor implies they are likely to be associated while
Bcor is equal to 0 implies they are unassociated. (See bcor
for details.)
Consequently, BCor-SIS pick out variables with larger Bcor values with y
.
Theory and numerical result indicate that BCor-SIS has following advantages:
BCor-SIS can retain the efficient variables even when the dimensionality (i.e.,
ncol(x)
) is an exponential order of the sample size (i.e.,exp(nrow(x))
);It is distribution-free and model-free;
It is very robust;
It is works well for complex data, such as shape and survival data;
If x
is a matrix, the sample sizes of x
and y
must agree.
If x
is a list
object, each element of this list
must with the same sample size.
x
and y
must not contain missing or infinite values.
When method = "survival"
, the matrix or data.frame pass to y
must have exactly two columns, where the first column is
event (failure) time while the second column is a dichotomous censored status.
Value
ix |
the indices vector corresponding to variables selected by BCor-SIS. |
method |
the method used. |
weight |
the weight used. |
complete.info |
a |
Note
bcorsis
simultaneously computing Ball Correlation statistics with
"constant"
, "probability"
, and "chisquare"
weights.
Users can get other Ball Correlation statistics with different weight in the complete.info
element of output.
We give a quick example below to illustrate.
Author(s)
Wenliang Pan, Weinan Xiao, Xueqin Wang, Hongtu Zhu, Jin Zhu
References
Wenliang Pan, Xueqin Wang, Weinan Xiao & Hongtu Zhu (2018) A Generic Sure Independence Screening Procedure, Journal of the American Statistical Association, DOI: 10.1080/01621459.2018.1462709
See Also
Examples
## Not run:
############### Quick Start for bcorsis function ###############
set.seed(1)
n <- 150
p <- 3000
x <- matrix(rnorm(n * p), nrow = n)
eps <- rnorm(n)
y <- 3 * x[, 1] + 5 * (x[, 3])^2 + eps
res <- bcorsis(y = y, x = x)
head(res[["ix"]])
head(res[["complete.info"]][["statistic"]])
############### BCor-SIS: Censored Data Example ###############
data("genlung")
result <- bcorsis(x = genlung[["covariate"]], y = genlung[["survival"]],
method = "survival")
index <- result[["ix"]]
top_gene <- colnames(genlung[["covariate"]])[index]
head(top_gene, n = 1)
############### BCor-SIS: Interaction Pursuing ###############
set.seed(1)
n <- 150
p <- 3000
x <- matrix(rnorm(n * p), nrow = n)
eps <- rnorm(n)
y <- 3 * x[, 1] * x[, 5] * x[, 10] + eps
res <- bcorsis(y = y, x = x, method = "interaction")
head(res[["ix"]])
############### BCor-SIS: Iterative Method ###############
library(mvtnorm)
set.seed(1)
n <- 150
p <- 3000
sigma_mat <- matrix(0.5, nrow = p, ncol = p)
diag(sigma_mat) <- 1
x <- rmvnorm(n = n, sigma = sigma_mat)
eps <- rnorm(n)
rm(sigma_mat); gc(reset = TRUE)
y <- 3 * (x[, 1])^2 + 5 * (x[, 2])^2 + 5 * x[, 8] - 8 * x[, 16] + eps
res <- bcorsis(y = y, x = x, method = "lm", d = 15)
res <- bcorsis(y = y, x = x, method = "gam", d = 15)
res[["ix"]]
############### Weighted BCor-SIS: Probability weight ###############
set.seed(1)
n <- 150
p <- 3000
x <- matrix(rnorm(n * p), nrow = n)
eps <- rnorm(n)
y <- 3 * x[, 1] + 5 * (x[, 3])^2 + eps
res <- bcorsis(y = y, x = x, weight = "prob")
head(res[["ix"]])
# Alternative, chisq weight:
res <- bcorsis(y = y, x = x, weight = "chisq")
head(res[["ix"]])
############### BCor-SIS: GWAS data ###############
set.seed(1)
n <- 150
p <- 3000
x <- sapply(1:p, function(i) {
sample(0:2, size = n, replace = TRUE)
})
eps <- rnorm(n)
y <- 6 * x[, 1] - 7 * x[, 2] + 5 * x[, 3] + eps
res <- bcorsis(x = x, y = y, category = TRUE)
head(res[["ix"]])
head(res[["complete.info"]][["statistic"]])
x <- cbind(matrix(rnorm(n * 2), ncol = 2), x)
# remove the first two columns:
res <- bcorsis(x = x, y = y, category = c(-1, -2))
head(res[["ix"]])
x <- cbind(x[, 3:5], matrix(rnorm(n * p), ncol = p))
res <- bcorsis(x = x, y = y, category = 1:3)
head(res[["ix"]], n = 10)
## End(Not run)