| bcorsis {Ball} | R Documentation |
Ball Correlation based Sure Independence Screening (BCor-SIS)
Description
Generic non-parametric sure independence screening (SIS) procedure based on Ball Correlation. Ball correlation is a generic measure of dependence in Banach spaces.
Usage
bcorsis(
x,
y,
d = "small",
weight = c("constant", "probability", "chisquare"),
method = "standard",
distance = FALSE,
category = FALSE,
parms = list(d1 = 5, d2 = 5, df = 3),
num.threads = 0
)
Arguments
x |
a numeric matrix or data.frame included |
y |
a numeric vector, matrix, or data.frame. |
d |
the hard cutoff rule suggests selecting |
weight |
a logical or character string used to choose the weight form of Ball Covariance statistic..
If input is a character string, it must be one of |
method |
specific method for the BCor-SIS procedure. It must be one of |
distance |
if |
category |
a logical value or integer vector indicating columns to be selected as categorical variables.
If |
parms |
parameters list only available when |
num.threads |
number of threads. If |
Details
bcorsis performs a model-free generic sure independence screening procedure,
BCor-SIS, to pick out variables from x which are potentially associated with y.
BCor-SIS relies on Ball correlation, a universal dependence measure in Banach spaces.
Ball correlation (BCor) ranges from 0 to 1. A larger BCor implies they are likely to be associated while
Bcor is equal to 0 implies they are unassociated. (See bcor for details.)
Consequently, BCor-SIS pick out variables with larger Bcor values with y.
Theory and numerical result indicate that BCor-SIS has following advantages:
BCor-SIS can retain the efficient variables even when the dimensionality (i.e.,
ncol(x)) is an exponential order of the sample size (i.e.,exp(nrow(x)));It is distribution-free and model-free;
It is very robust;
It is works well for complex data, such as shape and survival data;
If x is a matrix, the sample sizes of x and y must agree.
If x is a list object, each element of this list must with the same sample size.
x and y must not contain missing or infinite values.
When method = "survival", the matrix or data.frame pass to y must have exactly two columns, where the first column is
event (failure) time while the second column is a dichotomous censored status.
Value
ix |
the indices vector corresponding to variables selected by BCor-SIS. |
method |
the method used. |
weight |
the weight used. |
complete.info |
a |
Note
bcorsis simultaneously computing Ball Correlation statistics with
"constant", "probability", and "chisquare" weights.
Users can get other Ball Correlation statistics with different weight in the complete.info element of output.
We give a quick example below to illustrate.
Author(s)
Wenliang Pan, Weinan Xiao, Xueqin Wang, Hongtu Zhu, Jin Zhu
References
Wenliang Pan, Xueqin Wang, Weinan Xiao & Hongtu Zhu (2018) A Generic Sure Independence Screening Procedure, Journal of the American Statistical Association, DOI: 10.1080/01621459.2018.1462709
See Also
Examples
## Not run:
############### Quick Start for bcorsis function ###############
set.seed(1)
n <- 150
p <- 3000
x <- matrix(rnorm(n * p), nrow = n)
eps <- rnorm(n)
y <- 3 * x[, 1] + 5 * (x[, 3])^2 + eps
res <- bcorsis(y = y, x = x)
head(res[["ix"]])
head(res[["complete.info"]][["statistic"]])
############### BCor-SIS: Censored Data Example ###############
data("genlung")
result <- bcorsis(x = genlung[["covariate"]], y = genlung[["survival"]],
method = "survival")
index <- result[["ix"]]
top_gene <- colnames(genlung[["covariate"]])[index]
head(top_gene, n = 1)
############### BCor-SIS: Interaction Pursuing ###############
set.seed(1)
n <- 150
p <- 3000
x <- matrix(rnorm(n * p), nrow = n)
eps <- rnorm(n)
y <- 3 * x[, 1] * x[, 5] * x[, 10] + eps
res <- bcorsis(y = y, x = x, method = "interaction")
head(res[["ix"]])
############### BCor-SIS: Iterative Method ###############
library(mvtnorm)
set.seed(1)
n <- 150
p <- 3000
sigma_mat <- matrix(0.5, nrow = p, ncol = p)
diag(sigma_mat) <- 1
x <- rmvnorm(n = n, sigma = sigma_mat)
eps <- rnorm(n)
rm(sigma_mat); gc(reset = TRUE)
y <- 3 * (x[, 1])^2 + 5 * (x[, 2])^2 + 5 * x[, 8] - 8 * x[, 16] + eps
res <- bcorsis(y = y, x = x, method = "lm", d = 15)
res <- bcorsis(y = y, x = x, method = "gam", d = 15)
res[["ix"]]
############### Weighted BCor-SIS: Probability weight ###############
set.seed(1)
n <- 150
p <- 3000
x <- matrix(rnorm(n * p), nrow = n)
eps <- rnorm(n)
y <- 3 * x[, 1] + 5 * (x[, 3])^2 + eps
res <- bcorsis(y = y, x = x, weight = "prob")
head(res[["ix"]])
# Alternative, chisq weight:
res <- bcorsis(y = y, x = x, weight = "chisq")
head(res[["ix"]])
############### BCor-SIS: GWAS data ###############
set.seed(1)
n <- 150
p <- 3000
x <- sapply(1:p, function(i) {
sample(0:2, size = n, replace = TRUE)
})
eps <- rnorm(n)
y <- 6 * x[, 1] - 7 * x[, 2] + 5 * x[, 3] + eps
res <- bcorsis(x = x, y = y, category = TRUE)
head(res[["ix"]])
head(res[["complete.info"]][["statistic"]])
x <- cbind(matrix(rnorm(n * 2), ncol = 2), x)
# remove the first two columns:
res <- bcorsis(x = x, y = y, category = c(-1, -2))
head(res[["ix"]])
x <- cbind(x[, 3:5], matrix(rnorm(n * p), ncol = p))
res <- bcorsis(x = x, y = y, category = 1:3)
head(res[["ix"]], n = 10)
## End(Not run)