MRFcov {MRFcov} | R Documentation |
Markov Random Fields with covariates
Description
This function is the workhorse of the MRFcov
package, running
separate penalized regressions for each node to estimate parameters of
Markov Random Fields (MRF) graphs. Covariates can be included
(a class of models known as Conditional Random Fields; CRF), to estimate
how interactions between nodes vary across covariate magnitudes.
Usage
MRFcov(
data,
symmetrise,
prep_covariates,
n_nodes,
n_cores,
n_covariates,
family,
bootstrap = FALSE,
progress_bar = FALSE
)
Arguments
data |
A |
symmetrise |
The method to use for symmetrising corresponding parameter estimates
(which are taken from separate regressions). Options are |
prep_covariates |
Logical. If |
n_nodes |
Positive integer. The index of the last column in |
n_cores |
Positive integer. The number of cores to spread the job across using
|
n_covariates |
Positive integer. The number of covariates in |
family |
The response type. Responses can be quantitative continuous ( |
bootstrap |
Logical. Used by |
progress_bar |
Logical. Progress bar in pbapply is used if |
Details
Separate penalized regressions are used to approximate
MRF parameters, where the regression for node j
includes an
intercept and coefficients for the abundance (families gaussian
or poisson
)
or presence-absence (family binomial
) of all other
nodes (/j
) in data
. If covariates are included, coefficients
are also estimated for the effect of the covariate on j
, and for the
effects of the covariate on interactions between j
and all other nodes
(/j
). Note that interaction coefficients must be estimated between variables that
are on roughly the same scale, as the resulting parameter estimates are
unified into a Markov Random Field using the specified symmetrise
function.
Counts for poisson
variables, which are often not on the same scale,
will therefore be normalised with a nonparanormal transformation
x = qnorm(rank(log2(x + 0.01)) / (length(x) + 1))
. These transformed counts
will be used in a (family = "gaussian")
model and their respective raw distribution parameters returned so that coefficients
can be back-transformed for interpretation (this back-transformation is
performed automatatically by other functions including predict_MRF
and cv_MRF_diag
). Gaussian variables are not automatically transformed, so
if they cover quite different ranges and scales, then it is recommended to scale them prior to fitting
models. For more information on this process, use
vignette("Gaussian_Poisson_CRFs")
Note that since the number of parameters to estimate in each node-wise regression
quickly increases with increasing numbers of nodes and covariates,
LASSO penalization is used to regularize
regressions. This is done by minimising the cross-validated
mean error for each node separately using cv.glmnet
. In this way,
we maximise the log-likelihood of each node
separately before unifying the nodes into a graph.
Value
A list
containing:
-
graph
: Estimated parametermatrix
of pairwise interaction effects -
intercepts
: Estimated parametervector
of node intercepts -
indirect_coefs
:list
containing matrices representing indirect effects of each covariate on pairwise node interactions -
direct_coefs
:matrix
of direct effects of each parameter on each outcome node. Forfamily = 'binomial'
models, all coefficients are estimated on the logit scale. -
param_names
: Character string of covariate parameter names -
mod_type
: A character stating the type of model that was fit (used in other functions) -
mod_family
: A character stating the family of model that was fit (used in other functions) -
poiss_sc_factors
: A matrix of the estimated negative binomial or poisson parameters for each raw node variable (only returned iffamily = "poisson"
). These are needed for converting coefficients back to their original distribution, and are used for prediction purposes only
References
Ising, E. (1925). Beitrag zur Theorie des Ferromagnetismus.
Zeitschrift für Physik A Hadrons and Nuclei, 31, 253-258.
Cheng, J., Levina, E., Wang, P. & Zhu, J. (2014).
A sparse Ising model with covariates. (2012). Biometrics, 70, 943-953.
Clark, NJ, Wells, K and Lindberg, O.
Unravelling changing interspecific interactions across environmental gradients
using Markov random fields. (2018). Ecology doi: 10.1002/ecy.2221
Full text here.
Sutton C, McCallum A. An introduction to conditional random fields.
Foundations and Trends in Machine Learning 4, 267-373.
See Also
Cheng et al. (2014), Sutton & McCallum (2012) and Clark et al. (2018)
for overviews of Conditional Random Fields. See cv.glmnet
for
details of cross-validated optimization using LASSO penalty. Worked examples to showcase
this function can be found using vignette("Bird_Parasite_CRF")
and
vignette("Gaussian_Poisson_CRFs")
Examples
data("Bird.parasites")
CRFmod <- MRFcov(data = Bird.parasites, n_nodes = 4, family = 'binomial')