stan_biglm {rstanarm} | R Documentation |
Bayesian regularized linear but big models via Stan
Description
This is the same model as with stan_lm
but it utilizes the
output from biglm
in the biglm package in order to
proceed when the data is too large to fit in memory.
Usage
stan_biglm(
biglm,
xbar,
ybar,
s_y,
...,
prior = R2(stop("'location' must be specified")),
prior_intercept = NULL,
prior_PD = FALSE,
algorithm = c("sampling", "meanfield", "fullrank"),
adapt_delta = NULL
)
stan_biglm.fit(
b,
R,
SSR,
N,
xbar,
ybar,
s_y,
has_intercept = TRUE,
...,
prior = R2(stop("'location' must be specified")),
prior_intercept = NULL,
prior_PD = FALSE,
algorithm = c("sampling", "meanfield", "fullrank", "optimizing"),
adapt_delta = NULL,
importance_resampling = TRUE,
keep_every = 1
)
Arguments
biglm |
The list output by |
xbar |
A numeric vector of column means in the implicit design matrix excluding the intercept for the observations included in the model. |
ybar |
A numeric scalar indicating the mean of the outcome for the observations included in the model. |
s_y |
A numeric scalar indicating the unbiased sample standard deviation of the outcome for the observations included in the model. |
... |
Further arguments passed to the function in the rstan
package ( Another useful argument that can be passed to rstan via |
prior |
Must be a call to |
prior_intercept |
Either Note: If using a dense representation of the design matrix
—i.e., if the |
prior_PD |
A logical scalar (defaulting to |
algorithm |
A string (possibly abbreviated) indicating the
estimation approach to use. Can be |
adapt_delta |
Only relevant if |
b |
A numeric vector of OLS coefficients, excluding the intercept |
R |
A square upper-triangular matrix from the QR decomposition of the design matrix, excluding the intercept |
SSR |
A numeric scalar indicating the sum-of-squared residuals for OLS |
N |
A integer scalar indicating the number of included observations |
has_intercept |
A logical scalar indicating whether to add an intercept to the model when estimating it. |
importance_resampling |
Logical scalar indicating whether to use
importance resampling when approximating the posterior distribution with
a multivariate normal around the posterior mode, which only applies
when |
keep_every |
Positive integer, which defaults to 1, but can be higher
in order to thin the importance sampling realizations and also only
apples when |
Details
The stan_biglm
function is intended to be used in the same
circumstances as the biglm
function in the biglm
package but with an informative prior on the R^2
of the regression.
Like biglm
, the memory required to estimate the model
depends largely on the number of predictors rather than the number of
observations. However, stan_biglm
and stan_biglm.fit
have
additional required arguments that are not necessary in
biglm
, namely xbar
, ybar
, and s_y
.
If any observations have any missing values on any of the predictors or the
outcome, such observations do not contribute to these statistics.
Value
The output of both stan_biglm
and stan_biglm.fit
is an
object of stanfit-class
rather than
stanreg-objects
, which is more limited and less convenient
but necessitated by the fact that stan_biglm
does not bring the full
design matrix into memory. Without the full design matrix,some of the
elements of a stanreg-objects
object cannot be calculated,
such as residuals. Thus, the functions in the rstanarm package that
input stanreg-objects
, such as
posterior_predict
cannot be used.
Examples
if (.Platform$OS.type != "windows" || .Platform$r_arch != "i386") {
# create inputs
ols <- lm(mpg ~ wt + qsec + am, data = mtcars, # all row are complete so ...
na.action = na.exclude) # not necessary in this case
b <- coef(ols)[-1]
R <- qr.R(ols$qr)[-1,-1]
SSR <- crossprod(ols$residuals)[1]
not_NA <- !is.na(fitted(ols))
N <- sum(not_NA)
xbar <- colMeans(mtcars[not_NA,c("wt", "qsec", "am")])
y <- mtcars$mpg[not_NA]
ybar <- mean(y)
s_y <- sd(y)
post <- stan_biglm.fit(b, R, SSR, N, xbar, ybar, s_y, prior = R2(.75),
# the next line is only to make the example go fast
chains = 1, iter = 500, seed = 12345)
cbind(lm = b, stan_lm = rstan::get_posterior_mean(post)[13:15,]) # shrunk
}