susie_rss {susieR} | R Documentation |
Sum of Single Effects (SuSiE) Regression using Summary Statistics
Description
susie_rss
performs variable selection under a
sparse Bayesian multiple linear regression of Y
on X
using the z-scores from standard univariate regression
of Y
on each column of X
, an estimate, R
, of
the correlation matrix for the columns of X
, and optionally,
but strongly recommended, the sample size n. See
“Details” for other ways to call susie_rss
Usage
susie_rss(
z,
R,
n,
bhat,
shat,
var_y,
z_ld_weight = 0,
estimate_residual_variance = FALSE,
prior_variance = 50,
check_prior = TRUE,
...
)
Arguments
z |
p-vector of z-scores. |
R |
p x p correlation matrix. |
n |
The sample size. |
bhat |
Alternative summary data giving the estimated effects
(a vector of length p). This, together with |
shat |
Alternative summary data giving the standard errors of
the estimated effects (a vector of length p). This, together with
|
var_y |
The sample variance of y, defined as |
z_ld_weight |
This parameter is included for backwards
compatibility with previous versions of the function, but it is no
longer recommended to set this to a non-zero value. When
|
estimate_residual_variance |
The default is FALSE, the
residual variance is fixed to 1 or variance of y. If the in-sample
LD matrix is provided, we recommend setting
|
prior_variance |
The prior variance(s) for the non-zero
noncentrality parameterss |
check_prior |
When |
... |
Other parameters to be passed to
|
Details
In some applications, particularly genetic applications,
it is desired to fit a regression model (Y = Xb + E
say,
which we refer to as "the original regression model" or ORM)
without access to the actual values of Y
and X
, but
given only some summary statistics. susie_rss
assumes
availability of z-scores from standard univariate regression of
Y
on each column of X
, and an estimate, R
, of the
correlation matrix for the columns of X
(in genetic
applications R
is sometimes called the “LD matrix”).
With the inputs z
, R
and sample size n
,
susie_rss
computes PVE-adjusted z-scores z_tilde
, and
calls susie_suff_stat
with XtX = (n-1)R
, Xty =
\sqrt{n-1} z_tilde
, yty = n-1
, n = n
. The
output effect estimates are on the scale of b
in the ORM with
standardized X
and y
. When the LD matrix
R
and the z-scores z
are computed using the same
matrix X
, the results from susie_rss
are same as, or
very similar to, susie
with standardized X
and
y
.
Alternatively, if the user provides n
, bhat
(the
univariate OLS estimates from regressing y
on each column of
X
), shat
(the standard errors from these OLS
regressions), the in-sample correlation matrix R =
cov2cor(crossprod(X))
, and the variance of y
, the results
from susie_rss
are same as susie
with X
and
y
. The effect estimates are on the same scale as the
coefficients b
in the ORM with X
and y
.
In rare cases in which the sample size, n
, is unknown,
susie_rss
calls susie_suff_stat
with XtX = R
and Xty = z
, and with residual_variance = 1
. The
underlying assumption of performing the analysis in this way is
that the sample size is large (i.e., infinity), and/or the
effects are small. More formally, this combines the log-likelihood
for the noncentrality parameters, \tilde{b} = \sqrt{n} b
,
L(\tilde{b}; z, R) = -(\tilde{b}'R\tilde{b} -
2z'\tilde{b})/2,
with the “susie prior” on
\tilde{b}
; see susie
and Wang et al
(2020) for details. In this case, the effect estimates returned by
susie_rss
are on the noncentrality parameter scale.
The estimate_residual_variance
setting is FALSE
by
default, which is recommended when the LD matrix is estimated from
a reference panel. When the LD matrix R
and the summary
statistics z
(or bhat
, shat
) are computed
using the same matrix X
, we recommend setting
estimate_residual_variance = TRUE
.
Value
A "susie"
object with the following
elements:
alpha |
An L by p matrix of posterior inclusion probabilites. |
mu |
An L by p matrix of posterior means, conditional on inclusion. |
mu2 |
An L by p matrix of posterior second moments, conditional on inclusion. |
lbf |
log-Bayes Factor for each single effect. |
lbf_variable |
log-Bayes Factor for each variable and single effect. |
V |
Prior variance of the non-zero elements of b. |
elbo |
The value of the variational lower bound, or “ELBO” (objective function to be maximized), achieved at each iteration of the IBSS fitting procedure. |
sets |
Credible sets estimated from model fit; see
|
pip |
A vector of length p giving the (marginal) posterior inclusion probabilities for all p covariates. |
niter |
Number of IBSS iterations that were performed. |
converged |
|
References
G. Wang, A. Sarkar, P. Carbonetto and M. Stephens (2020). A simple new approach to variable selection in regression, with application to genetic fine-mapping. Journal of the Royal Statistical Society, Series B 82, 1273-1300 doi: 10.1101/501114.
Y. Zou, P. Carbonetto, G. Wang, G and M. Stephens (2022). Fine-mapping from summary data with the “Sum of Single Effects” model. PLoS Genetics 18, e1010299. doi: 10.1371/journal.pgen.1010299.
Examples
set.seed(1)
n = 1000
p = 1000
beta = rep(0,p)
beta[1:4] = 1
X = matrix(rnorm(n*p),nrow = n,ncol = p)
X = scale(X,center = TRUE,scale = TRUE)
y = drop(X %*% beta + rnorm(n))
input_ss = compute_suff_stat(X,y,standardize = TRUE)
ss = univariate_regression(X,y)
R = with(input_ss,cov2cor(XtX))
zhat = with(ss,betahat/sebetahat)
res = susie_rss(zhat,R, n=n)
# Toy example illustrating behaviour susie_rss when the z-scores
# are mostly consistent with a non-invertible correlation matrix.
# Here the CS should contain both variables, and two PIPs should
# be nearly the same.
z = c(6,6.01)
R = matrix(1,2,2)
fit = susie_rss(z,R)
print(fit$sets$cs)
print(fit$pip)
# In this second toy example, the only difference is that one
# z-score is much larger than the other. Here we expect that the
# second PIP will be much larger than the first.
z = c(6,7)
R = matrix(1,2,2)
fit = susie_rss(z,R)
print(fit$sets$cs)
print(fit$pip)