pred-projection {projpred} | R Documentation |
Predictions from a submodel (after projection)
Description
After the projection of the reference model onto a submodel, the linear
predictors (for the original or a new dataset) based on that submodel can be
calculated by proj_linpred()
. These linear predictors can also be
transformed to response scale and averaged across the projected parameter
draws. Furthermore, proj_linpred()
returns the corresponding log predictive
density values if the (original or new) dataset contains response values. The
proj_predict()
function draws from the predictive distributions (there is
one such distribution for each observation from the original or new dataset)
of the submodel that the reference model has been projected onto. If the
projection has not been performed yet, both functions call project()
internally to perform the projection. Both functions can also handle multiple
submodels at once (for object
s of class vsel
or object
s returned by a
project()
call to an object of class vsel
; see project()
).
Usage
proj_linpred(
object,
newdata = NULL,
offsetnew = NULL,
weightsnew = NULL,
filter_nterms = NULL,
transform = FALSE,
integrated = FALSE,
allow_nonconst_wdraws_prj = return_draws_matrix,
return_draws_matrix = FALSE,
.seed = NA,
...
)
proj_predict(
object,
newdata = NULL,
offsetnew = NULL,
weightsnew = NULL,
filter_nterms = NULL,
nresample_clusters = 1000,
return_draws_matrix = FALSE,
.seed = NA,
resp_oscale = TRUE,
...
)
Arguments
object |
An object returned by |
newdata |
Passed to argument |
offsetnew |
Passed to argument |
weightsnew |
Passed to argument |
filter_nterms |
Only applies if |
transform |
For |
integrated |
For |
allow_nonconst_wdraws_prj |
Only relevant for |
return_draws_matrix |
A single logical value indicating whether to
return an object (in case of |
.seed |
Pseudorandom number generation (PRNG) seed by which the same
results can be obtained again if needed. Passed to argument |
... |
Arguments passed to |
nresample_clusters |
For |
resp_oscale |
Only relevant for the latent projection. A single logical
value indicating whether to draw from the posterior-projection predictive
distributions on the original response scale ( |
Details
Currently, proj_predict()
ignores observation weights that are not
equal to 1
. A corresponding warning is thrown if this is the case.
In case of the latent projection and transform = FALSE
:
Output element
pred
contains the linear predictors without any modifications that may be due to the original response distribution (e.g., for abrms::cumulative()
model, the ordered thresholds are not taken into account).Output element
lpd
contains the latent log predictive density values, i.e., those corresponding to the latent Gaussian distribution. Ifnewdata
is notNULL
, this requires the latent response values to be supplied in a column called.<response_name>
ofnewdata
where<response_name>
needs to be replaced by the name of the original response variable (if<response_name>
contained parentheses, these have been stripped off byinit_refmodel()
; see the left-hand side offormula(<refmodel>)
). For technical reasons, the existence of column<response_name>
innewdata
is another requirement (even though.<response_name>
is actually used).
Value
In the following, S_{\mathrm{prj}}
, N
,
C_{\mathrm{cat}}
, and C_{\mathrm{lat}}
from help
topic refmodel-init-get are used. (For proj_linpred()
with integrated = TRUE
, we have S_{\mathrm{prj}} = 1
.) Furthermore, let
C
denote either C_{\mathrm{cat}}
(if transform = TRUE
)
or C_{\mathrm{lat}}
(if transform = FALSE
). Then, if the
prediction is done for one submodel only (i.e., length(nterms) == 1 || !is.null(predictor_terms)
in the explicit or implicit call to project()
,
see argument object
):
-
proj_linpred()
returns alist
with the following elements:Element
pred
contains the actual predictions, i.e., the linear predictors, possibly transformed to response scale (depending on argumenttransform
).Element
lpd
is non-NULL
only ifnewdata
isNULL
or ifnewdata
contains response values in the corresponding column. In that case, it contains the log predictive density values (conditional on each of the projected parameter draws ifintegrated = FALSE
and averaged across the projected parameter draws ifintegrated = TRUE
).
In case of (i) the traditional projection, (ii) the latent projection with
transform = FALSE
, or (iii) the latent projection withtransform = TRUE
and<refmodel>$family$cats
(where<refmodel>
is an object resulting frominit_refmodel()
; see alsoextend_family()
's argumentlatent_y_unqs
) beingNULL
, both elements areS_{\mathrm{prj}} \times N
matrices (converted to a—possibly weighted—draws_matrix
if argumentreturn_draws_matrix
isTRUE
, see the description of this argument). In case of (i) the augmented-data projection or (ii) the latent projection withtransform = TRUE
and<refmodel>$family$cats
being notNULL
,pred
is anS_{\mathrm{prj}} \times N \times C
array (if argumentreturn_draws_matrix
isTRUE
, this array is "compressed" to anS_{\mathrm{prj}} \times (N \cdot C)
matrix—with the columns consisting ofC
blocks ofN
rows—and then converted to a—possibly weighted—draws_matrix
) andlpd
is anS_{\mathrm{prj}} \times N
matrix (converted to a—possibly weighted—draws_matrix
if argumentreturn_draws_matrix
isTRUE
). Ifreturn_draws_matrix
isFALSE
andallow_nonconst_wdraws_prj
isTRUE
andintegrated
isFALSE
and the projected draws have nonconstant weights, then bothlist
elements have the weights of these draws stored in an attributewdraws_prj
. (Ifreturn_draws_matrix
,allow_nonconst_wdraws_prj
, andintegrated
are allFALSE
, then projected draws with nonconstant weights cause an error.) -
proj_predict()
returns anS_{\mathrm{prj}} \times N
matrix of predictions whereS_{\mathrm{prj}}
denotesnresample_clusters
in case of clustered projection (or, more generally, in case of projected draws with nonconstant weights). If argumentreturn_draws_matrix
isTRUE
, the returned matrix is converted to adraws_matrix
(seeposterior::draws_matrix()
). In case of (i) the augmented-data projection or (ii) the latent projection withresp_oscale = TRUE
and<refmodel>$family$cats
being notNULL
, the returned matrix (ordraws_matrix
) has an attribute calledcats
(the character vector of response categories) and the values of the matrix (ordraws_matrix
) are the predicted indices of the response categories (these indices refer to the order of the response categories from attributecats
).
If the prediction is done for more than one submodel, the output from above
is returned for each submodel, giving a named list
with one element for
each submodel (the names of this list
being the numbers of predictor
terms of the submodels when counting the intercept, too).
Examples
# Data:
dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)
# The `stanreg` fit which will be used as the reference model (with small
# values for `chains` and `iter`, but only for technical reasons in this
# example; this is not recommended in general):
fit <- rstanarm::stan_glm(
y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
QR = TRUE, chains = 2, iter = 500, refresh = 0, seed = 9876
)
# Projection onto an arbitrary combination of predictor terms (with a small
# value for `ndraws`, but only for the sake of speed in this example; this
# is not recommended in general):
prj <- project(fit, predictor_terms = c("X1", "X3", "X5"), ndraws = 21,
seed = 9182)
# Predictions (at the training points) from the submodel onto which the
# reference model was projected:
prjl <- proj_linpred(prj)
prjp <- proj_predict(prj, .seed = 7364)