svb.fit {sparsevb} | R Documentation |
Fit Approximate Posteriors to Sparse Linear and Logistic Models
Description
Main function of the sparsevb
package. Computes
mean-field posterior approximations for both linear and logistic regression
models, including variable selection via sparsity-inducing spike and slab
priors.
Usage
svb.fit(
X,
Y,
family = c("linear", "logistic"),
slab = c("laplace", "gaussian"),
mu,
sigma = rep(1, ncol(X)),
gamma,
alpha,
beta,
prior_scale = 1,
update_order,
intercept = FALSE,
noise_sd,
max_iter = 1000,
tol = 1e-05
)
Arguments
X |
A numeric design matrix, each row of which represents a vector of
covariates/independent variables/features. Though not required, it is
recommended to center and scale the columns to have norm
|
Y |
An |
family |
A character string selecting the regression model, either
|
slab |
A character string specifying the prior slab density, either
|
mu |
An |
sigma |
A positive |
gamma |
An |
alpha |
A positive numeric value, parametrizing the beta hyper-prior on
the inclusion probabilities. If omitted, |
beta |
A positive numeric value, parametrizing the beta hyper-prior on
the inclusion probabilities. If omitted, |
prior_scale |
A numeric value, controlling the scale parameter of the
prior slab density. Used as the scale parameter |
update_order |
A permutation of |
intercept |
A Boolean variable, controlling if an intercept should be included. NB: This feature is still experimental in logistic regression. |
noise_sd |
A positive numerical value, serving as estimate for the
residual noise standard deviation in linear regression. If missing it will
be estimated, see |
max_iter |
A positive integer, controlling the maximum number of iterations for the variational update loop. |
tol |
A small, positive numerical value, controlling the termination criterion for maximum absolute differences between binary entropies of successive iterates. |
Details
Suppose \theta
is the p
-dimensional true parameter. The
spike-and-slab prior for \theta
may be represented by the
hierarchical scheme
w \sim \mathrm{Beta}(\alpha, \beta),
z_j
\mid w \sim_{i.i.d.} \mathrm{Bernoulli}(w),
\theta_j\mid z_j
\sim_{ind.} (1-z_j)\delta_0 + z_j g.
Here, \delta_0
represents the
Dirac measure at 0
. The slab g
may be taken either as a
\mathrm{Laplace}(0,\lambda)
or N(0,\sigma^2)
density. The
former has centered density
f_\lambda(x) = \frac{\lambda}{2}
e^{-\lambda |x|}.
Given \alpha
and \beta
, the beta hyper-prior
has density
b(x\mid \alpha, \beta) = \frac{x^{\alpha - 1}(1 -
x)^{\beta - 1}}{\int_0^1 t^{\alpha - 1}(1 - t)^{\beta - 1}\ \mathrm{d}t}.
A straightforward integration shows that the prior inclusion probability of
a coefficient is \frac{\alpha}{\alpha + \beta}
.
Value
The approximate mean-field posterior, given as a named list
containing numeric vectors "mu"
, "sigma"
, "gamma"
, and
a value "intercept"
. The latter is set to NA
in case
intercept = FALSE
. In mathematical terms, the conditional
distribution of each \theta_j
is given by
\theta_j\mid \mu_j,
\sigma_j, \gamma_j \sim_{ind.} \gamma_j N(\mu_j, \sigma^2) + (1-\gamma_j)
\delta_0.
Examples
### Simulate a linear regression problem of size n times p, with sparsity level s ###
n <- 250
p <- 500
s <- 5
### Generate toy data ###
X <- matrix(rnorm(n*p), n, p) #standard Gaussian design matrix
theta <- numeric(p)
theta[sample.int(p, s)] <- runif(s, -3, 3) #sample non-zero coefficients in random locations
pos_TR <- as.numeric(theta != 0) #true positives
Y <- X %*% theta + rnorm(n) #add standard Gaussian noise
### Run the algorithm in linear mode with Laplace prior and prioritized initialization ###
test <- svb.fit(X, Y, family = "linear")
posterior_mean <- test$mu * test$gamma #approximate posterior mean
pos <- as.numeric(test$gamma > 0.5) #significant coefficients
### Assess the quality of the posterior estimates ###
TPR <- sum(pos[which(pos_TR == 1)])/sum(pos_TR) #True positive rate
FDR <- sum(pos[which(pos_TR != 1)])/max(sum(pos), 1) #False discovery rate
L2 <- sqrt(sum((posterior_mean - theta)^2)) #L_2-error
MSPE <- sqrt(sum((X %*% posterior_mean - Y)^2)/n) #Mean squared prediction error