nlps {GWASinlps} | R Documentation |
Non-local prior based single-step variable selection for high-dimensional data
Description
nlpsLM
, nlpsGLM
, nlpsAFTM
perform variable selection in a single iteration respectively for continuous, binary and survival outcomes, combining the computational efficiency of the 'structured screen-and-select' variable selection strategy based on some association learning and the parsimonious uncertainty quantification provided by the use of non-local priors (see the References).
Usage
nlpsLM(y, x, cor_xy, prior = c("mom", "imom", "emom", "zellner",
"horseshoe"), tau, priorDelta = modelbbprior(1,1),
k0, rxx, niter = 2000, verbose = F,
tau.hs.method = "halfCauchy", sigma.hs.method = "Jeffreys" )
nlpsGLM(y, x, mmle_xy, prior = c("mom", "imom", "zellner"),
tau, priorDelta = modelbbprior(1,1),
k0, rxx, niter = 2000, verbose = F )
nlpsAFTM(y, event, x, mu_xy, prior = c("mom", "imom", "emom",
"zellner"), tau, priorDelta = modelbbprior(1,1),
k0, rxx, niter = 2000, verbose = F )
Arguments
y |
The vector of continuous response (phenotype) for linear models (LM), or binary response (phenotype) for generalized linear models (GLM), or survival times for accelerated failure time models (AFTM). Binary response values must be 0 or 1. |
event |
Only for AFTM. The vector of status indicator for the survival data. |
x |
The design matrix with subjects in rows and independent variables (SNPs) in columns. Missing values are not accepted currently. |
cor_xy |
Only for LM. Vector of (Pearson) correlations of |
mmle_xy |
Only for GLM. Vector of maximum marginal likelihood estimates of the regression parameters for the variables (SNPs) in |
mu_xy |
Only for AFTM. Vector of marginal utility estimates of the variables (SNPs) in |
prior |
|
tau |
the value of the scale parameter tau of the non-local prior. |
priorDelta |
Prior for model space. Defaults to |
k0 |
GWASinlps tuning parameter denoting the number of leading SNPs (see Details). |
rxx |
GWASinlps tuning parameter denoting the correlation threshold to determine leading sets (see References). |
niter |
Number of MCMC iterations for non-local prior based Bayesian variable selection. Defaults to 2000. |
verbose |
If TRUE, prints result from the iterations progressively. FALSE by default. |
tau.hs.method |
Necessary only when |
sigma.hs.method |
Necessary only when |
Details
The nlpsLM
, nlpsGLM
and nlpsAFTM
functions perform SNP selection in one iteration for continuous data, binary data, and survival data, respectively. The GWASinlps
function repeatedly calls these functions. For details of the procedure, see the reference for the GWASinlps
function.
Value
A list with elements
hppm |
The names of variables (SNPs) appearing in the highest posterior probability model (HPPM) of at least one leading set. |
not.selected |
The names of variables (SNPs) appearing in at least one leading set but in none of the HPPMs. |
Author(s)
Nilotpal Sanyal <nilotpal.sanyal@gmail.com>
References
Sanyal et al. (2019), "GWASinlps: Non-local prior based iterative SNP selection tool for genome-wide association studies". Bioinformatics, 35(1), 1-11.
Sanyal, N. (2022). "Iterative variable selection for high-dimensional data with binary outcomes". arXiv preprint arXiv:2211.03190.
See Also
GWASinlps
, modelSelection
, horseshoe
Examples
n = 100
p = 1000
m = 10
# Generate design matrix (genotype matrix)
set.seed(1)
f = runif( p, .1, .2 ) # simulate minor allele frequency
x = matrix( nrow = n, ncol = p )
colnames(x) = 1:p
for(j in 1:p)
x[,j] = rbinom( n, 2, f[j] )
# Generate true effect sizes
causal_snps = sample( 1:p, m )
beta = rep( 0, p )
set.seed(1)
beta[causal_snps] = rnorm(m, mean = 0, sd = 2 )
# Generate continuous (phenotype) data
y.cont = x %*% beta + rnorm(n, 0, 1)
# Generate binary (phenotype) data
prob = exp(x %*% beta)/(1 + exp(x %*% beta))
y.bin = sapply(1:n, function(i)rbinom(1,1,prob[i]) )
# Fix scale parameter tau
tau = 0.022
# GWASinlps analysis
cor_xy = c(cor(x,y.cont))
names(cor_xy) = colnames(x)
nlps_cont = nlpsLM(y.cont, x, cor_xy=cor_xy, prior="mom",
tau=tau, k0=2, rxx=0.3, niter=10000, verbose=TRUE)
nlps_cont
library(fastglm)
mode(x) = "double" #needed for fastglm() function below
mmle_xy = apply( x, 2, function(z) coef( fastglm(y=y.bin,
x=cbind(1,matrix(z)), family = binomial(link = "logit")) )[2] )
nlps_bin = nlpsGLM(y.bin, x, mmle_xy=mmle_xy, prior="mom",
tau=tau, k0=2, rxx=0.3, niter=10000, verbose=TRUE)
nlps_bin