MRFcov_spatial {MRFcov}R Documentation

Spatially structured Markov Random Fields with covariates

Description

This function calls the MRFcov function to fit separate penalized regressions for each node and approximate parameters of Markov Random Fields (MRF) graphs. Supplied GPS coordinates are used to account for spatial autocorrelation via Gaussian Process spatial regression splines.

Usage

MRFcov_spatial(
  data,
  symmetrise,
  prep_covariates,
  n_nodes,
  n_cores,
  n_covariates,
  family,
  coords,
  prep_splines = TRUE,
  bootstrap = FALSE,
  progress_bar = FALSE
)

Arguments

data

A dataframe. The input data where the n_nodes left-most variables are variables that are to be represented by nodes in the graph

symmetrise

The method to use for symmetrising corresponding parameter estimates (which are taken from separate regressions). Options are min (take the coefficient with the smallest absolute value), max (take the coefficient with the largest absolute value) or mean (take the mean of the two coefficients). Default is mean

prep_covariates

Logical. If TRUE, covariate columns will be cross-multiplied with nodes to prep the dataset for MRF models. Note this is only useful when additional covariates are provided. Therefore, if n_nodes < NCOL(data), default is TRUE. Otherwise, default is FALSE. See prep_MRF_covariates for more information

n_nodes

Positive integer. The index of the last column in data which is represented by a node in the final graph. Columns with index greater than n_nodes are taken as covariates. Default is the number of columns in data, corresponding to no additional covariates

n_cores

Positive integer. The number of cores to spread the job across using makePSOCKcluster. Default is 1 (no parallelisation)

n_covariates

Positive integer. The number of covariates in data, before cross-multiplication. Default is NCOL(data) - n_nodes

family

The response type. Responses can be quantitative continuous (family = "gaussian"), non-negative counts (family = "poisson") or binomial 1s and 0s (family = "binomial"). If using (family = "binomial"), please note that if nodes occur in less than 5 percent of observations this can make it generally difficult to estimate occurrence probabilities (on the extreme end, this can result in intercept-only models being fitted for the nodes in question). The function will issue a warning in this case. If nodes occur in more than 95 percent of observations, this will return an error as the cross-validation step will generally be unable to proceed. For family = 'poisson' models, all returned coefficients are estimated on the identity scale AFTER using a nonparanormal transformation. See vignette("Gaussian_Poisson_CRFs") for details of interpretation

coords

A two-column dataframe (with nrow(coords) == nrow(data)) representing the spatial coordinates of each observation in data. Ideally, these coordinates will represent Latitude and Longitude GPS points for each observation. The coordinates are used to create smoothed Gaussian Process spatial regression splines via smooth.construct2. Here, the basis dimension of the smoothed term is chosen based on the number of unique GPS coordinates in coords. If this number is less than 100, then this number is used. If the number of unique coordiantes is more than 100, a value of 100 is used (this parameter needs to be large in order to ensure enough degrees of freedom for estimating 'wiggliness' of the smooth term; see choose.k for details). These splines will be included in each node-wise regression as additional penalized covariates. This ensures that resulting node interaction parameters are estimated after accounting for possible spatial autocorrelation. Note that interpretation of spatial autocorrelation is difficult, and so it is recommended to compare predictive capacities spatial and non-spatial CRFs through the predict_MRF function

prep_splines

Logical. If spatial splines are already included in data, set to FALSE. Default is TRUE

bootstrap

Logical. Used by bootstrap_MRF to reduce memory usage

progress_bar

Logical. Progress bar in pbapply is used if TRUE, but this slows estimation.

Value

A list of all elements contained in a returned MRFcov object, with the inclusion of a dataframe called mrf_data. This contains all prepped covariates including the added spatial regression splines, and should be used as data when generating predictions via predict_MRF or predict_MRFnetworks

References

Kammann, E. E. and M.P. Wand (2003) Geoadditive Models. Applied Statistics 52(1):1-18.

See Also

See smooth.construct2 and smooth.construct.gp.smooth.spec for details of Gaussian process spatial regression splines. Worked examples to showcase this function can be found using vignette("Bird_Parasite_CRF")

Examples


data("Bird.parasites")
Latitude <- sample(seq(120, 140, length.out = 100), nrow(Bird.parasites), TRUE)
Longitude <- sample(seq(-19, -22, length.out = 100), nrow(Bird.parasites), TRUE)
coords <- data.frame(Latitude = Latitude, Longitude = Longitude)
CRFmod_spatial <- MRFcov_spatial(data = Bird.parasites, n_nodes = 4,
                                family = 'binomial', coords = coords)


[Package MRFcov version 1.0.39 Index]