jackstraw_subspace {jackstraw} | R Documentation |
Jackstraw for the User-Defined Dimension Reduction Methods
Description
Test association between the observed variables and their latent variables, captured by a user-defined dimension reduction method.
Usage
jackstraw_subspace(
dat,
r,
FUN,
r1 = NULL,
s = NULL,
B = NULL,
covariate = NULL,
noise = NULL,
verbose = TRUE
)
Arguments
dat |
a data matrix with |
r |
a number of significant latent variables. |
FUN |
Provide a specific function to estimate LVs. Must output |
r1 |
a numeric vector of latent variables of interest. |
s |
a number of “synthetic” null variables. Out of |
B |
a number of resampling iterations. |
covariate |
a model matrix of covariates with |
noise |
specify a parametric distribution to generate a noise term. If |
verbose |
a logical specifying to print the computational progress. |
Details
This function computes m
p-values of linear association between m
variables and their latent variables,
captured by a user-defined dimension reduction method.
Its resampling strategy accounts for the over-fitting characteristics due to direct computation of PCs from the observed data
and protects against an anti-conservative bias.
This function allows you to specify a parametric distribution of a noise term. It is an experimental feature. Then, a small number s
of observed variables
are replaced by synthetic null variables generated from a specified distribution.
Value
jackstraw_subspace
returns a list consisting of
p.value |
|
obs.stat |
|
null.stat |
|
Author(s)
Neo Christopher Chung nchchung@gmail.com
References
Chung and Storey (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics, 31(4): 545-554 https://academic.oup.com/bioinformatics/article/31/4/545/2748186
Chung (2020) Statistical significance of cluster membership for unsupervised evaluation of cell identities. Bioinformatics, 36(10): 3107–3114 https://academic.oup.com/bioinformatics/article/36/10/3107/5788523
See Also
Examples
## simulate data from a latent variable model: Y = BL + E
B = c(rep(1,50),rep(-1,50), rep(0,900))
L = rnorm(20)
E = matrix(rnorm(1000*20), nrow=1000)
dat = B %*% t(L) + E
dat = t(scale(t(dat), center=TRUE, scale=TRUE))
## apply the jackstraw with the svd as a function
out = jackstraw_subspace(dat, FUN = function(x) svd(x)$v[,1,drop=FALSE], r=1, s=100, B=50)