R: Fit Approximate Posteriors to Sparse Linear and Logistic...

svb.fit {sparsevb}

R Documentation

Fit Approximate Posteriors to Sparse Linear and Logistic Models

Description

Main function of the sparsevb package. Computes mean-field posterior approximations for both linear and logistic regression models, including variable selection via sparsity-inducing spike and slab priors.

Usage

svb.fit(
  X,
  Y,
  family = c("linear", "logistic"),
  slab = c("laplace", "gaussian"),
  mu,
  sigma = rep(1, ncol(X)),
  gamma,
  alpha,
  beta,
  prior_scale = 1,
  update_order,
  intercept = FALSE,
  noise_sd,
  max_iter = 1000,
  tol = 1e-05
)

Arguments

`X`	A numeric design matrix, each row of which represents a vector of covariates/independent variables/features. Though not required, it is recommended to center and scale the columns to have norm `sqrt(nrow(X))`.
`Y`	An `nrow(X)`-dimensional response vector, numeric if `family = "linear"` and binary if `family = "logistic"`.
`family`	A character string selecting the regression model, either `"linear"` or `"logistic"`.
`slab`	A character string specifying the prior slab density, either `"laplace"` or `"gaussian"`.
`mu`	An `ncol(X)`-dimensional numeric vector, serving as initial guess for the variational means. If omitted, `mu` will be estimated via ridge regression to initialize the coordinate ascent algorithm.
`sigma`	A positive `ncol(X)`-dimensional numeric vector, serving as initial guess for the variational standard deviations.
`gamma`	An `ncol(X)`-dimensional vector of probabilities, serving as initial guess for the variational inclusion probabilities. If omitted, `gamma` will be estimated via LASSO regression to initialize the coordinate ascent algorithm.
`alpha`	A positive numeric value, parametrizing the beta hyper-prior on the inclusion probabilities. If omitted, `alpha` will be chosen empirically via LASSO regression.
`beta`	A positive numeric value, parametrizing the beta hyper-prior on the inclusion probabilities. If omitted, `beta` will be chosen empirically via LASSO regression.
`prior_scale`	A numeric value, controlling the scale parameter of the prior slab density. Used as the scale parameter `\lambda` when `prior = "laplace"`, or as the standard deviation `\sigma` if `prior = "gaussian"`.
`update_order`	A permutation of `1:ncol(X)`, giving the update order of the coordinate-ascent algorithm. If omitted, a data driven updating order is used, see Ray and Szabo (2020) in Journal of the American Statistical Association for details.
`intercept`	A Boolean variable, controlling if an intercept should be included. NB: This feature is still experimental in logistic regression.
`noise_sd`	A positive numerical value, serving as estimate for the residual noise standard deviation in linear regression. If missing it will be estimated, see `estimateSigma` from the `selectiveInference` package for more details. Has no effect when `family = "logistic"`.
`max_iter`	A positive integer, controlling the maximum number of iterations for the variational update loop.
`tol`	A small, positive numerical value, controlling the termination criterion for maximum absolute differences between binary entropies of successive iterates.

Details

Suppose \theta is the p-dimensional true parameter. The spike-and-slab prior for \theta may be represented by the hierarchical scheme

w \sim \mathrm{Beta}(\alpha, \beta),

z_j \mid w \sim_{i.i.d.} \mathrm{Bernoulli}(w),

\theta_j\mid z_j \sim_{ind.} (1-z_j)\delta_0 + z_j g.

Here, \delta_0 represents the Dirac measure at 0. The slab g may be taken either as a \mathrm{Laplace}(0,\lambda) or N(0,\sigma^2) density. The former has centered density

f_\lambda(x) = \frac{\lambda}{2} e^{-\lambda |x|}.

Given \alpha and \beta, the beta hyper-prior has density

b(x\mid \alpha, \beta) = \frac{x^{\alpha - 1}(1 - x)^{\beta - 1}}{\int_0^1 t^{\alpha - 1}(1 - t)^{\beta - 1}\ \mathrm{d}t}.

A straightforward integration shows that the prior inclusion probability of a coefficient is \frac{\alpha}{\alpha + \beta}.

Value

The approximate mean-field posterior, given as a named list containing numeric vectors "mu", "sigma", "gamma", and a value "intercept". The latter is set to NA in case intercept = FALSE. In mathematical terms, the conditional distribution of each \theta_j is given by

\theta_j\mid \mu_j, \sigma_j, \gamma_j \sim_{ind.} \gamma_j N(\mu_j, \sigma^2) + (1-\gamma_j) \delta_0.

Examples


### Simulate a linear regression problem of size n times p, with sparsity level s ###

n <- 250
p <- 500
s <- 5

### Generate toy data ###

X <- matrix(rnorm(n*p), n, p) #standard Gaussian design matrix

theta <- numeric(p)
theta[sample.int(p, s)] <- runif(s, -3, 3) #sample non-zero coefficients in random locations

pos_TR <- as.numeric(theta != 0) #true positives

Y <- X %*% theta + rnorm(n) #add standard Gaussian noise

### Run the algorithm in linear mode with Laplace prior and prioritized initialization ###

test <- svb.fit(X, Y, family = "linear")

posterior_mean <- test$mu * test$gamma #approximate posterior mean

pos <- as.numeric(test$gamma > 0.5) #significant coefficients

### Assess the quality of the posterior estimates ###

TPR <- sum(pos[which(pos_TR == 1)])/sum(pos_TR) #True positive rate

FDR <- sum(pos[which(pos_TR != 1)])/max(sum(pos), 1) #False discovery rate

L2 <- sqrt(sum((posterior_mean - theta)^2)) #L_2-error

MSPE <- sqrt(sum((X %*% posterior_mean - Y)^2)/n) #Mean squared prediction error

[Package sparsevb version 0.1.0 Index]