reglogit {reglogit} | R Documentation |
Gibbs sampling for regularized logistic regression
Description
Regularized (multinomial) logistic regression by Gibbs sampling implementing subtly different MCMC schemes with varying efficiency depending on the data type (binary v. binomial, say) and the desired estimator (regularized maximum likelihood, or Bayesian maximum a posteriori/posterior mean, etc.) through a unified interface.
Usage
reglogit(T, y, X, N = NULL, flatten = FALSE, sigma = 1, nu = 1,
kappa = 1, icept = TRUE, normalize = TRUE, zzero = TRUE,
powerprior = TRUE, kmax = 442, bstart = NULL, lt = NULL,
nup = list(a = 2, b = 0.1), save.latents = FALSE, verb = 100)
regmlogit(T, y, X, flatten = FALSE, sigma = 1, nu = 1, kappa = 1,
icept=TRUE, normalize = TRUE, zzero = TRUE, powerprior = TRUE,
kmax = 442, bstart = NULL, lt = NULL, nup = list(a=2, b=0.1),
save.latents = FALSE, verb=100)
Arguments
T |
a positive integer scalar specifying the number of MCMC rounds |
y |
|
X |
a design |
N |
an optional integer vector of total numbers of replicate trials
for each |
flatten |
a scalar |
sigma |
weights on the regression coefficients in the lasso penalty. The
default of |
nu |
a non-negative scalar indicating the initial value of the penalty parameter |
kappa |
a positive scalar specifying the multiplicity; |
icept |
a scalar |
normalize |
a scalar logical which, if |
zzero |
a scalar |
powerprior |
a scalar |
kmax |
a positive integer indicating the number replacing infinity in the
sum for mixing density in the generative expression for
|
bstart |
an optional vector of length |
lt |
an optional vector of length |
nup |
prior parameters |
save.latents |
a scalar |
verb |
A positive integer indicating the number of MCMC rounds after which
a progress statement is printed. Giving |
Details
These are the main functions in the package. They support an omnibus framework for simulation-based regularized logistic regression. The default arguments invoke a Gibbs sampling algorithm to sample from the posterior distribution of a logistic regression model with lasso-type (double-exponential) priors. See the paper by Gramacy & Polson (2012) for details. Both cdf and pdf implementations are provided, which use slightly different latent variable representations, resulting in slightly different Gibbs samplers. These methods extend the un-regularized methods of Holmes & Held (2006)
The kappa
parameter facilitates simulated annealing (SA)
implementations in order to help find the MAP, and other point
estimators. The actual SA algorithm is not provided in the package.
However, it is easy to string calls to this function, using the
outputs from one call as inputs to another, in order to establish a SA
schedule for increasing kappa values.
The regmlogit
function is a wrapper around the Gibbs sampler
inside reglogit
, invoking C-1
linked chains for C
classes, extending the polychotomous regression scheme outlined by
Holmes & Held (2006). For an example with regmlogit
, see
predict.regmlogit
Value
The output is a list
object of type "reglogit"
or
"regmlogit"
containing a subset of the following fields;
for "regmlogit"
everyhing is expanded by one dimension into
an array
or matrix
as appropriate.
X |
the input design |
y |
the input response variable |
beta |
a |
z |
if |
lambda |
a |
lpost |
a vector of log posterior probabilities of the parameters |
map |
the |
kappa |
the input multiplicity parameter |
omega |
a |
Author(s)
Robert B. Gramacy rbg@vt.edu
References
R.B. Gramacy, N.G. Polson. “Simulation-based regularized logistic regression”. (2012) Bayesian Analysis, 7(3), p567-590; arXiv:1005.3430; https://arxiv.org/abs/1005.3430
C. Holmes, K. Held (2006). “Bayesian Auxilliary Variable Models for Binary and Multinomial Regression”. Bayesian Analysis, 1(1), p145-168.
See Also
predict.reglogit
, predict.regmlogit
,
blasso
and regress
Examples
## load in the pima indian data
data(pima)
X <- as.matrix(pima[,-9])
y <- as.numeric(pima[,9])
## pre-normalize to match the comparison in the paper
one <- rep(1, nrow(X))
normx <- sqrt(drop(one %*% (X^2)))
X <- scale(X, FALSE, normx)
## compare to the GLM fit
fit.logit <- glm(y~X, family=binomial(link="logit"))
bstart <- fit.logit$coef
## do the Gibbs sampling
T <- 300 ## set low for CRAN checks; increase to >= 1000 for better results
out6 <- reglogit(T, y, X, nu=6, nup=NULL, bstart=bstart, normalize=FALSE)
## plot the posterior distribution of the coefficients
burnin <- (1:(T/10))
boxplot(out6$beta[-burnin,], main="nu=6, kappa=1", ylab="posterior",
xlab="coefficients", bty="n", names=c("mu", paste("b", 1:8, sep="")))
abline(h=0, lty=2)
## add in GLM fit and MAP with legend
points(bstart, col=2, pch=17)
points(out6$map$beta, pch=19, col=3)
legend("topright", c("MLE", "MAP"), col=2:3, pch=c(17,19))
## simple prediction
p6 <- predict(out6, XX=X)
## hit rate
mean(p6$c == y)
##
## for a polychotomous example, with prediction,
## see ? predict.regmlogit
##
## Not run:
## now with kappa=10
out10 <- reglogit(T, y, X, kappa=10, nu=6, nup=NULL, bstart=bstart,
normalize=FALSE)
## plot the posterior distribution of the coefficients
par(mfrow=c(1,2))
boxplot(out6$beta[-burnin,], main="nu=6, kappa=1", ylab="posterior",
xlab="coefficients", bty="n", names=c("mu", paste("b", 1:8, sep="")))
abline(h=0, lty=2)
points(bstart, col=2, pch=17)
points(out6$map$beta, pch=19, col=3)
legend("topright", c("MLE", "MAP"), col=2:3, pch=c(17,19))
boxplot(out10$beta[-burnin,], main="nu=6, kappa=10", ylab="posterior",
xlab="coefficients", bty="n", names=c("mu", paste("b", 1:8, sep="")))
abline(h=0, lty=2)
## add in GLM fit and MAP with legend
points(bstart, col=2, pch=17)
points(out10$map$beta, pch=19, col=3)
legend("topright", c("MLE", "MAP"), col=2:3, pch=c(17,19))
## End(Not run)
##
## now some binomial data
##
## Not run:
## synthetic data generation
library(boot)
N <- rep(20, 100)
beta <- c(2, -3, 2, -4, 0, 0, 0, 0, 0)
X <- matrix(runif(length(N)*length(beta)), ncol=length(beta))
eta <- drop(1 + X %*% beta)
p <- inv.logit(eta)
y <- rbinom(length(N), N, p)
## run the Gibbs sampler for the logit -- uses the fast Binomial
## version; for a comparison, try flatten=FALSE
out <- reglogit(T, y, X, N)
## plot the posterior distribution of the coefficients
boxplot(out$beta[-burnin,], main="binomial data", ylab="posterior",
xlab="coefficients", bty="n",
names=c("mu", paste("b", 1:ncol(X), sep="")))
abline(h=0, lty=2)
## add in GLM fit, the MAP fit, the truth, and a legend
fit.logit <- glm(y/N~X, family=binomial(link="logit"), weights=N)
points(fit.logit$coef, col=2, pch=17)
points(c(1, beta), col=4, pch=16)
points(out$map$beta, pch=19, col=3)
legend("topright", c("MLE", "MAP", "truth"), col=2:4, pch=c(17,19,16))
## also try specifying a larger kappa value to pin down the MAP
## End(Not run)