bootstrap_MRF {MRFcov} | R Documentation |
Bootstrap observations to estimate MRF parameter coefficients
Description
This function runs MRFcov
models multiple times to capture uncertainty
in parameter esimates. The dataset is shuffled and missing
values (if found) are imputed in each bootstrap iteration.
Usage
bootstrap_MRF(
data,
n_bootstraps,
sample_seed,
symmetrise,
n_nodes,
n_cores,
n_covariates,
family,
sample_prop,
spatial = FALSE,
coords = NULL
)
Arguments
data |
Dataframe. The input data where the |
n_bootstraps |
Positive integer. Represents the total number of bootstrap samples
to test. Default is |
sample_seed |
Numeric. Used as the seed value for generating bootstrap replicates, allowing users to generate replicated datasets on different systems. Default is a random seed |
symmetrise |
The method to use for symmetrising corresponding parameter estimates
(which are taken from separate regressions). Options are |
n_nodes |
Positive integer. The index of the last column in |
n_cores |
Integer. The number of cores to spread the job across using
|
n_covariates |
Positive integer. The number of covariates in |
family |
The response type. Responses can be quantitative continuous ( |
sample_prop |
Positive probability value indicating the proportion of rows to sample from
|
spatial |
Logical. If |
coords |
A two-column |
Details
MRFcov
models are fit via cross-validation using
cv.glmnet
. For each model, the data
is bootstrapped
by shuffling row observations and fitting models to a subset of observations
to account for uncertainty in parameter estimates.
Parameter estimates from the set of bootstrapped models are summarised
to present means and confidence intervals (as 95 percent quantiles).
Value
A list
containing:
-
direct_coef_means
:dataframe
containing mean coefficient values taken from all bootstrapped models across the iterations -
direct_coef_upper90
anddirect_coef_lower90
:dataframe
s containing coefficient 95 percent and 5 percent quantiles taken from all bootstrapped models across the iterations -
indirect_coef_mean
:list
of symmetric matrices (one matrix for each covariate) containing mean effects of covariates on pairwise interactions -
mean_key_coefs
:list
of matrices of lengthn_nodes
containing mean covariate coefficient values and their relative importances (using the formulax^2 / sum (x^2)
taken from all bootstrapped models across iterations. Only coefficients with mean relative importances>0.01
are returned. Note, relative importance are only useful if all covariates are on a similar scale. -
mod_type
: A character stating the type of model that was fit (used in other functions) -
mod_family
: A character stating the family of model that was fit (used in other functions) -
poiss_sc_factors
: A vector of the square-root mean scaling factors used to standardisepoisson
variables (only returned iffamily = "poisson"
)
See Also
MRFcov
, MRFcov_spatial
,
cv.glmnet
Examples
data("Bird.parasites")
# Perform 2 quick bootstrap replicates using 70% of observations
bootedCRF <- bootstrap_MRF(data = Bird.parasites,
n_nodes = 4,
family = 'binomial',
sample_prop = 0.7,
n_bootstraps = 2)
# Small example of using spatial coordinates for a spatial CRF
Latitude <- sample(seq(120, 140, length.out = 100), nrow(Bird.parasites), TRUE)
Longitude <- sample(seq(-19, -22, length.out = 100), nrow(Bird.parasites), TRUE)
coords <- data.frame(Latitude = Latitude, Longitude = Longitude)
bootedSpatial <- bootstrap_MRF(data = Bird.parasites, n_nodes = 4,
family = 'binomial',
spatial = TRUE,
coords = coords,
sample_prop = 0.5,
n_bootstraps = 2)