sample.cont {plsgenomics} | R Documentation |
Generates design matrix X with correlated block of covariates and a continuous random reponse Y depening on X through gaussian linear model Y=XB+E
Description
The function sample.cont
generates a random sample with p predictors X, a response Y,
and n observations, through a linear model Y=XB+E, where the noise E is gaussian,
the coefficients B are sparse, and the design matrix X is composed of correlated blocks of
predictors.
Usage
sample.cont(n, p, kstar, lstar, beta.min, beta.max, mean.H=0, sigma.H,
sigma.F, sigma.E, seed=NULL)
Arguments
n |
the number of observations in the sample. |
p |
the number of covariates in the sample. |
kstar |
the number of underlying latent variables used to generates the design matrix
|
lstar |
the number of blocks in the design matrix |
beta.min |
the inf bound for non null coefficients (see details). |
beta.max |
the sup bound for non null coefficients (see details). |
mean.H |
the mean of latent variables used to generates |
sigma.H |
the standard deviation of latent variables used to generates |
sigma.F |
the standard deviation of the noise added to latent variables used to
generates |
sigma.E |
the standard deviation of the noise in the linear model
|
seed |
an positive integer, if non NULL it fix the seed (with the command
|
Details
The set (1:p) of predictors is partitioned into kstar block. Each block k (k=1,...,kstar) depends on a latent variable H.k which are independent and identically distributed following a distribution N(mean.H, sigma.H^2). Each columns X.j of the matrix X is generated as H.k + F.j for j in the block k, where F.j is independent and identically distributed gaussian noise N(0,sigma.F^2).
The coefficients B are generated as random between beta.min and beta.max on lstar blocks, randomly chosen, and null otherwise. The variables with non null coefficients are then relevant to explain the response, whereas the ones with null coefficients are not.
The response is generated as Y = X %*% B + E, where E is some gaussian noise N(0,sigma.E^2).
The details of the procedure are developped by Durif et al. (2018).
Value
A list with the following components:
X |
the (n x p) design matrix, containing the |
Y |
the (n) vector of Y observations. |
residuals |
the (n) vector corresponding to the noise |
sel |
the index in (1:p) of covariates with non null coefficients in |
nosel |
the index in (1:p) of covariates with null coefficients in |
B |
the (n) vector of coefficients. |
block.partition |
a (p) vector indicating the block of each predictors in (1:kstar). |
p |
the number of covariates in the sample. |
kstar |
the number of underlying latent variables used to generates the design matrix
|
lstar |
the number of blocks in the design matrix |
p0 |
the number of predictors with non null coefficients in |
block.sel |
a (lstar) vector indicating the index in (1:kstar) of blocks with predictors
having non null coefficient in |
beta.min |
the inf bound for non null coefficients (see details). |
beta.max |
the sup bound for non null coefficients (see details). |
mean.H |
the mean of latent variables used to generates |
sigma.H |
the standard deviation of latent variables used to generates |
sigma.F |
the standard deviation of the noise added to latent variables used to
generates |
sigma.E |
the standard deviation of the noise in the linear model. |
seed |
an positive integer, if non NULL it fix the seed (with the command
|
Author(s)
Ghislain Durif (https://gdurif.perso.math.cnrs.fr/).
References
Durif, G., Modolo, L., Michaelsson, J., Mold, J.E., Lambert-Lacroix, S., Picard, F., 2018. High dimensional classification with combined adaptive sparse PLS and logistic regression. Bioinformatics 34, 485–493. doi:10.1093/bioinformatics/btx571. Available at http://arxiv.org/abs/1502.05933.
See Also
Examples
### load plsgenomics library
library(plsgenomics)
### generating data
n <- 100
p <- 1000
sample1 <- sample.cont(n=n, p=p, kstar=20, lstar=2, beta.min=0.25, beta.max=0.75, mean.H=0.2,
sigma.H=10, sigma.F=5, sigma.E=5)
str(sample1)