MRFcov_spatial {MRFcov} | R Documentation |
Spatially structured Markov Random Fields with covariates
Description
This function calls the MRFcov
function to fit
separate penalized regressions for each node and approximate parameters of
Markov Random Fields (MRF) graphs. Supplied GPS coordinates are used to
account for spatial autocorrelation via Gaussian Process spatial regression
splines.
Usage
MRFcov_spatial(
data,
symmetrise,
prep_covariates,
n_nodes,
n_cores,
n_covariates,
family,
coords,
prep_splines = TRUE,
bootstrap = FALSE,
progress_bar = FALSE
)
Arguments
data |
A dataframe . The input data where the n_nodes
left-most variables are variables that are to be represented by nodes in the graph
|
symmetrise |
The method to use for symmetrising corresponding parameter estimates
(which are taken from separate regressions). Options are min (take the coefficient with the
smallest absolute value), max (take the coefficient with the largest absolute value)
or mean (take the mean of the two coefficients). Default is mean
|
prep_covariates |
Logical. If TRUE , covariate columns will be cross-multiplied
with nodes to prep the dataset for MRF models. Note this is only useful when additional
covariates are provided. Therefore, if n_nodes < NCOL(data) ,
default is TRUE . Otherwise, default is FALSE . See
prep_MRF_covariates for more information
|
n_nodes |
Positive integer. The index of the last column in data
which is represented by a node in the final graph. Columns with index
greater than n_nodes are taken as covariates. Default is the number of
columns in data , corresponding to no additional covariates
|
n_cores |
Positive integer. The number of cores to spread the job across using
makePSOCKcluster . Default is 1 (no parallelisation)
|
n_covariates |
Positive integer. The number of covariates in data , before cross-multiplication.
Default is NCOL(data) - n_nodes
|
family |
The response type. Responses can be quantitative continuous (family = "gaussian" ),
non-negative counts (family = "poisson" ) or binomial 1s and 0s (family = "binomial" ).
If using (family = "binomial" ), please note that if nodes occur in less than 5 percent
of observations this can make it generally difficult to
estimate occurrence probabilities (on the extreme end, this can result in intercept-only
models being fitted for the nodes in question). The function will issue a warning in this case.
If nodes occur in more than 95 percent of observations, this will return an error as the cross-validation
step will generally be unable to proceed. For family = 'poisson' models, all returned
coefficients are estimated on the identity scale AFTER using a nonparanormal transformation.
See vignette("Gaussian_Poisson_CRFs") for details of interpretation
|
coords |
A two-column dataframe (with nrow(coords) == nrow(data) )
representing the spatial coordinates of each observation in data . Ideally, these
coordinates will represent Latitude and Longitude GPS points for each observation. The coordinates
are used to create smoothed Gaussian Process spatial regression splines via
smooth.construct2 .
Here, the basis dimension of the smoothed term
is chosen based on the number of unique GPS coordinates in coords .
If this number is less than 100 , then this number is used. If the number of
unique coordiantes is more than 100 , a value of 100 is used
(this parameter needs to be large in order to ensure enough degrees of freedom
for estimating 'wiggliness' of the smooth term; see
choose.k for details).
These splines will be included in each node-wise regression as additional penalized covariates.
This ensures that resulting node interaction parameters are estimated after accounting for
possible spatial autocorrelation. Note that interpretation of spatial autocorrelation is difficult,
and so it is recommended to compare predictive capacities spatial and non-spatial CRFs through
the predict_MRF function
|
prep_splines |
Logical. If spatial splines are already included in data , set to
FALSE . Default is TRUE
|
bootstrap |
Logical. Used by bootstrap_MRF to reduce memory usage
|
progress_bar |
Logical. Progress bar in pbapply is used if TRUE , but this slows estimation.
|
Value
A list
of all elements contained in a returned MRFcov
object, with
the inclusion of a dataframe
called mrf_data
. This contains all prepped covariates
including the added spatial regression
splines, and should be used as data
when generating predictions
via predict_MRF
or predict_MRFnetworks
References
Kammann, E. E. and M.P. Wand (2003) Geoadditive Models.
Applied Statistics 52(1):1-18.
See Also
See smooth.construct2
and smooth.construct.gp.smooth.spec
for details of Gaussian process spatial regression splines. Worked examples to showcase
this function can be found using vignette("Bird_Parasite_CRF")
Examples
data("Bird.parasites")
Latitude <- sample(seq(120, 140, length.out = 100), nrow(Bird.parasites), TRUE)
Longitude <- sample(seq(-19, -22, length.out = 100), nrow(Bird.parasites), TRUE)
coords <- data.frame(Latitude = Latitude, Longitude = Longitude)
CRFmod_spatial <- MRFcov_spatial(data = Bird.parasites, n_nodes = 4,
family = 'binomial', coords = coords)
[Package
MRFcov version 1.0.39
Index]