R: General Matrix Projection Model Creation

mpm_create {lefko3}

R Documentation

General Matrix Projection Model Creation

Description

Function mpm_create() is the core workhorse function that creates all flavors of MPM in lefko3. All other MPM creation functions act as wrappers for this function. As such, this function provides the most general and most detailed control over the MPM creation process.

Usage

mpm_create(
  historical = FALSE,
  stage = TRUE,
  age = FALSE,
  devries = FALSE,
  reduce = FALSE,
  simple = FALSE,
  err_check = FALSE,
  data = NULL,
  year = NULL,
  pop = NULL,
  patch = NULL,
  stageframe = NULL,
  supplement = NULL,
  overwrite = NULL,
  repmatrix = NULL,
  alive = NULL,
  obsst = NULL,
  size = NULL,
  sizeb = NULL,
  sizec = NULL,
  repst = NULL,
  matst = NULL,
  fec = NULL,
  stages = NULL,
  yearcol = NULL,
  popcol = NULL,
  patchcol = NULL,
  indivcol = NULL,
  agecol = NULL,
  censorcol = NULL,
  modelsuite = NULL,
  paramnames = NULL,
  inda = NULL,
  indb = NULL,
  indc = NULL,
  dev_terms = NULL,
  density = NA_real_,
  CDF = TRUE,
  random_inda = FALSE,
  random_indb = FALSE,
  random_indc = FALSE,
  negfec = FALSE,
  exp_tol = 700L,
  theta_tol = 100000000L,
  censor = FALSE,
  censorkeep = NULL,
  start_age = NA_integer_,
  last_age = NA_integer_,
  fecage_min = NA_integer_,
  fecage_max = NA_integer_,
  fectime = 2L,
  fecmod = 1,
  cont = TRUE,
  prebreeding = TRUE,
  stage_NRasRep = FALSE,
  sparse_output = FALSE
)

Arguments

`historical`	A logical value indicating whether to build a historical MPM. Defaults to `FALSE`.
`stage`	A logical value indicating whether to build a stage-based MPM. If both `stage = TRUE` and `age = TRUE`, then will proceed to build an age-by-stage MPM. Defaults to `TRUE`.
`age`	A logical value indicating whether to build an age-based MPM. If both `stage = TRUE` and `age = TRUE`, then will proceed to build an age-by-stage MPM. Defaults to `FALSE`.
`devries`	A logical value indicating whether to use deVries format for historical MPMs. Defaults to `FALSE`, in which case historical MPMs are created in Ehrlen format.
`reduce`	A logical value denoting whether to remove ages, ahistorical stages, or historical stages associated exclusively with zero transitions. These are removed only if the respective row and column sums in ALL matrices estimated equal 0. Defaults to `FALSE`.
`simple`	A logical value indicating whether to produce `A`, `U`, and `F` matrices, or only the latter two. Defaults to `FALSE`, in which case all three are output.
`err_check`	A logical value indicating whether to append extra information used in matrix calculation within the output list. Defaults to `FALSE`.
`data`	A data frame of class `hfvdata`. Required for all MPMs, except for function-based MPMs in which `modelsuite` is set to a `vrm_input` object.
`year`	A variable corresponding to observation occasion, or a set of such values, given in values associated with the `year` term used in vital rate model development. Can also equal `"all"`, in which case matrices will be estimated for all occasions. Defaults to `"all"`.
`pop`	A variable designating which populations will have matrices estimated. Should be set to specific population names, or to `"all"` if all populations should have matrices estimated. Only used in raw MPMs.
`patch`	A variable designating which patches or subpopulations will have matrices estimated. Should be set to specific patch names, or to `"all"` if matrices should be estimated for all patches. Defaults to `NULL`, in which case patch designations are ignored.
`stageframe`	An object of class `stageframe`. These objects are generated by function `sf_create()`, and include information on the size, observation status, propagule status, reproduction status, immaturity status, maturity status, stage group, size bin widths, and other key characteristics of each ahistorical stage. Not needed for purely age-based MPMs.
`supplement`	An optional data frame of class `lefkoSD` that provides supplemental data that should be incorporated into the MPM. Three kinds of data may be integrated this way: transitions to be estimated via the use of proxy transitions, transition overwrites from the literature or supplemental studies, and transition multipliers for survival and fecundity. This data frame should be produced using the `supplemental()` function. Can be used in place of or in addition to an overwrite table (see `overwrite` below) and a reproduction matrix (see `repmatrix` below).
`overwrite`	An optional data frame developed with the `overwrite()` function describing transitions to be overwritten either with given values or with other estimated transitions. Note that this function supplements overwrite data provided in `supplement`.
`repmatrix`	An optional reproduction matrix. This matrix is composed mostly of `0`s, with non-zero entries acting as element identifiers and multipliers for fecundity (with `1` equaling full fecundity). If left blank, and no `supplement` is provided, then all stages marked as reproductive produce offspring at 1x that of estimated fecundity, and that offspring production will yield the first stage noted as propagule or immature. May be the dimensions of either a historical or an ahistorical matrix. If the latter, then all stages will be used in occasion t-1 for each suggested ahistorical transition. Not used in purely age-based MPMs.
`alive`	A vector of names of binomial variables corresponding to status as alive (`1`) or dead (`0`) in occasions t+1, t, and t-1, respectively. Defaults to `c("alive3", "alive2", "alive1")` for historical MPMs, and `c("alive3", "alive2")` for ahistorical MPMs. Only needed for raw MPMs.
`obsst`	A vector of names of binomial variables corresponding to observation status in occasions t+1, t, and t-1, respectively. Defaults to `c("obsstatus3", "obsstatus2", "obsstatus1")` for historical MPMs, and `c("obsstatus3", "obsstatus2")` for ahistorical MPMs. Only needed for raw MPMs.
`size`	A vector of names of variables coding the primary size variable in occasions t+1, t, and t-1, respectively. Defaults to `c("sizea3", "sizea2", "sizea1")` for historical MPMs, and `c("sizea3", "sizea2")` for ahistorical MPMs. Only needed for raw, stage-based MPMs.
`sizeb`	A vector of names of variables coding the secondary size variable in occasions t+1, t, and t-1, respectively. Defaults to an empty set, assuming that secondary size is not used. Only needed for raw, stage-based MPMs.
`sizec`	A vector of names of variables coding the tertiary size variable in occasions t+1, t, and t-1, respectively. Defaults to an empty set, assuming that tertiary size is not used. Only needed for raw, stage-based MPMs.
`repst`	A vector of names of binomial variables corresponding to reproductive status in occasions t+1, t, and t-1, respectively. Defaults to `c("repstatus3", "repstatus2", "repstatus1")` for historical MPMs, and `c("repstatus3", "repstatus2")` for ahistorical MPMs. Only needed for raw MPMs.
`matst`	A vector of names of binomial variables corresponding to maturity status in occasions t+1, t, and t-1, respectively. Defaults to `c("matstatus3", "matstatus2", "matstatus1")` for historical MPMs, and `c("matstatus3", "matstatus2")` for ahistorical MPMs. Must be provided if building raw MPMs, and `stages` is not provided.
`fec`	A vector of names of variables coding for fecundity in occasions t+1, t, and t-1, respectively. Defaults to `c("feca3", "feca2", "feca1")` for historical MPMs, and `c("feca3", "feca2")` for ahistorical MPMs. Only needed for raw, stage-based MPMs.
`stages`	An optional vector denoting the names of the variables within the main vertical dataset coding for the stages of each individual in occasions t+1 and t, and t-1, if historical. The names of stages in these variables should match those used in the `stageframe` exactly. If left blank, then `rlefko3()` will attempt to infer stages by matching values of `alive`, `obsst`, `size`, `sizev`, `sizec`, `repst`, and `matst` to characteristics noted in the associated `stageframe`. Only used in raw, stage-based MPMs.
`yearcol`	The variable name or column number corresponding to occasion t in the dataset. Defaults to `"year2"`. Only needed for raw MPMs.
`popcol`	The variable name or column number corresponding to the identity of the population. Defaults to `"popid"` if a value is provided for `pop`; otherwise empty. Only needed for raw MPMs.
`patchcol`	The variable name or column number corresponding to patch in the dataset. Defaults to `"patchid"` if a value is provided for `patch`; otherwise empty. Only needed for raw MPMs.
`indivcol`	The variable name or column number coding individual identity. Only needed for raw MPMs.
`agecol`	The variable name or column corresponding to age in time t. Defaults to `"obsage"`. Only used in raw age-based and age-by-stage MPMs.
`censorcol`	The variable name or column number denoting the censor status. Only needed in raw MPMs, and only if `censor = TRUE`.
`modelsuite`	One of three kinds of lists. The first is a `lefkoMod` object holding the vital rate models and associated metadata. Alternatively, an object of class `vrm_input` may be provided. Finally, this argument may simply be a list of models used to parameterize the MPM. In the final scenario, `data` and `paramnames` must also be given, and all variable names must match across all objects. If entered, then a function-based MPM will be developed. Otherwise, a raw MPM will be developed. Only used in function-based MPMs.
`paramnames`	A data frame with three columns, the first describing all terms used in linear modeling, the second (must be called `mainparams`) giving the general model terms that will be used in matrix creation, and the third showing the equivalent terms used in modeling (must be named `modelparams`). Function `create_pm()` can be used to create a skeleton `paramnames` object, which can then be edited. Only required to build function-based MPMs if `modelsuite` is neither a `lefkoMod` object nor a `vrm_input` object.
`inda`	Can be a single value to use for individual covariate `a` in all matrices, a pair of values to use for times t and t-1 in historical matrices, or a vector of such values corresponding to each occasion in the dataset. Defaults to `NULL`. Only used in function-based MPMs.
`indb`	Can be a single value to use for individual covariate `b` in all matrices, a pair of values to use for times t and t-1 in historical matrices, or a vector of such values corresponding to each occasion in the dataset. Defaults to `NULL`. Only used in function-based MPMs.
`indc`	Can be a single value to use for individual covariate `c` in all matrices, a pair of values to use for times t and t-1 in historical matrices, or a vector of such values corresponding to each occasion in the dataset. Defaults to `NULL`. Only used in function-based MPMs.
`dev_terms`	A numeric vector of 2 elements in the case of a Leslie MPM, and of 14 elements in all other cases. Consists of scalar additions to the y-intercepts of vital rate linear models used to estimate vital rates in function-based MPMs. Defaults to `0` values for all vital rates.
`density`	A numeric value indicating density value to use to propagate matrices. Only needed if density is an explanatory term used in one or more vital rate models. Defaults to `NA`. Only used in function_based MPMs.
`CDF`	A logical value indicating whether to use the cumulative distribution function to estimate size transition probabilities in function-based MPMs. Defaults to `TRUE`, and should only be changed to `FALSE` if approximate probabilities calculated via the midpoint method are preferred.
`random_inda`	A logical value denoting whether to treat individual covariate `a` as a random, categorical variable. Otherwise is treated as a fixed, numeric variable. Defaults to `FALSE`. Only used in function-based MPMs.
`random_indb`	A logical value denoting whether to treat individual covariate `b` as a random, categorical variable. Otherwise is treated as a fixed, numeric variable. Defaults to `FALSE`. Only used in function-based MPMs.
`random_indc`	A logical value denoting whether to treat individual covariate `c` as a random, categorical variable. Otherwise is treated as a fixed, numeric variable. Defaults to `FALSE`. Only used in function-based MPMs.
`negfec`	A logical value denoting whether fecundity values estimated to be negative should be reset to `0`. Defaults to `FALSE`.
`exp_tol`	A numeric value used to indicate a maximum value to set exponents to in the core kernel to prevent numerical overflow. Defaults to `700`. Only used in function-based MPMs.
`theta_tol`	A numeric value used to indicate a maximum value to theta as used in the negative binomial probability density kernel. Defaults to `100000000`, but can be reset to other values during error checking. Only used in function-based MPMs.
`censor`	If `TRUE`, then data will be removed according to the variable set in `censorcol`, such that only data with censor values equal to `censorkeep` will remain. Defaults to `FALSE`. Only used in raw MPMs.
`censorkeep`	The value of the censor variable denoting data elements to keep. Defaults to `0`. Only used in raw MPMs.
`start_age`	The age from which to start the matrix. Defaults to `NULL`, in which case age `1` is used if `prebreeding = TRUE`, and age `0` is used if `prebreeding = FALSE`. Only used in age-based MPMs.
`last_age`	The final age to use in the matrix. Defaults to `NULL`, in which case the highest age in the dataset is used. Only used in age-based and age-by-stage MPMs.
`fecage_min`	The minimum age at which reproduction is possible. Defaults to `NULL`, which is interpreted to mean that fecundity should be assessed starting in the minimum age observed in the dataset. Only used in age-based MPMs.
`fecage_max`	The maximum age at which reproduction is possible. Defaults to `NULL`, which is interpreted to mean that fecundity should be assessed until the final observed age. Only used in age-based MPMs.
`fectime`	An integer indicating whether to estimate fecundity using the variable given for `fec` in time t (`2`) or time t+1 (`3`). Only used for purely age-based MPMs. Defaults to `2`.
`fecmod`	A scalar multiplier for fecundity. Only used for purely age-based MPMs. Defaults to `1.0`.
`cont`	A logical value designating whether to allow continued survival of individuals past the final age noted in age-based and age-by-stage MPMs, using the demographic characteristics of the final age. Defaults to `TRUE`.
`prebreeding`	A logical value indicating whether the life history model is a pre-breeding model. Defaults to `TRUE`.
`stage_NRasRep`	A logical value indicating whether to treat non-reproductive individuals as reproductive. Used only in raw, stage-based MPMs in cases where stage assignment must still be handled. Not used in function-based MPMs, and in stage-based MPMs in which a valid `hfvdata` class data frame with stages already assigned is provided.
`sparse_output`	A logical value indicating whether to output matrices in sparse format. Defaults to `FALSE`, in which case all matrices are output in standard matrix format.

Value

An object of class lefkoMat. This is a list that holds the matrix projection model and all of its metadata. The structure has the following elements:

`A`	A list of full projection matrices in order of sorted patches and occasion times. All matrices output in R's `matrix` class, or in the `dgCMatrix` class from the `Matrix` package if sparse.
`U`	A list of survival transition matrices sorted as in `A`. All matrices output in R's `matrix` class, or in the `dgCMatrix` class from the `Matrix` package if sparse.
`F`	A list of fecundity matrices sorted as in `A`. All matrices output in R's `matrix` class, or in the `dgCMatrix` class from the `Matrix` package if sparse.
`hstages`	A data frame matrix showing the pairing of ahistorical stages used to create historical stage pairs. Only used in historical MPMs.
`agestages`	A data frame showing age-stage pairs. Only used in age-by-stage MPMs.
`ahstages`	A data frame detailing the characteristics of associated ahistorical stages, in the form of a modified stageframe that includes status as an entry stage through reproduction. Used in all stage-based and age-by-stage MPMs.
`labels`	A data frame giving the population, patch, and year of each matrix in order.
`dataqc`	A vector showing the numbers of individuals and rows in the vertical dataset used as input.
`matrixqc`	A short vector describing the number of non-zero elements in `U` and `F` matrices, and the number of annual matrices.
`modelqc`	This is the `qc` portion of the `modelsuite` input.
`prob_out`	An optional element only added if `err_check = TRUE`. This is a list of vital rate probability matrices, with 7 columns in the order of survival, observation probability, reproduction probability, primary size transition probability, secondary size transition probability, tertiary size transition probability, and probability of juvenile transition to maturity.
`allstages`	An optional element only added if `err_check = TRUE`. This is a data frame giving the values used to determine each matrix element capable of being estimated.
`data`	An optional element only added if `err_check = TRUE` and a raw MPM is requested. This consists of the original dataset as edited by this function for indexing purposes.

General Notes

This function automatically determines whether to create a raw or function-based MPM given inputs supplied by the user.

If used, the reproduction matrix (field repmatrix) may be supplied as either historical or ahistorical. If provided as historical, then a historical MPM must be estimated.

If neither a supplement nor a reproduction matrix are used, and the MPM to create is stage-based, then fecundity will be assumed to occur from all reproductive stages to all propagule and immature stages.

Function-based MPM Notes

Users may at times wish to estimate MPMs using a dataset incorporating multiple patches or subpopulations, but without discriminating between those patches or subpopulations. Should the aim of analysis be a general MPM that does not distinguish these patches or subpopulations, the modelsearch() run should not include patch terms.

Input options including multiple variable names must be entered in the order of variables in occasion t+1, t, and t-1. Rearranging the order will lead to erroneous calculations, and will may lead to fatal errors.

This function provides two different means of estimating the probability of size transition. The midpoint method (CDF = FALSE) refers to the method in which the probability is estimated by first estimating the probability associated with transition from the exact size at the midpoint of the size class using the corresponding probability density function, and then multiplying that value by the bin width of the size class. Doak et al. 2021 (Ecological Monographs) noted that this method can produce biased results, with total size transitions associated with a specific size not totaling to 1.0 and even specific size transition probabilities capable of being estimated at values greater than 1.0. The alternative and default method (CDF = TRUE) uses the cumulative density function to estimate the probability of size transition as the cumulative probability of size transition at the greater limit of the size class minus the cumulative probability of size transition at the lower limit of the size class. This latter method avoids this bias. Note, however, that both methods are exact and unbiased for negative binomial and Poisson distributions.

Under the Gaussian and gamma size distributions, the number of estimated parameters may differ between the two ipm_method settings. Because the midpoint method has a tendency to incorporate upward bias in the estimation of size transition probabilities, it is more likely to yield non- zero values when the true probability is extremely close to 0. This will result in the summary.lefkoMat() function yielding higher numbers of estimated parameters than the ipm_method = "CDF" yields in some cases.

Examples


data(lathyrus)

sizevector <- c(0, 4.6, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8,
  9)
stagevector <- c("Sd", "Sdl", "Dorm", "Sz1nr", "Sz2nr", "Sz3nr", "Sz4nr",
  "Sz5nr", "Sz6nr", "Sz7nr", "Sz8nr", "Sz9nr", "Sz1r", "Sz2r", "Sz3r", 
  "Sz4r", "Sz5r", "Sz6r", "Sz7r", "Sz8r", "Sz9r")
repvector <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1)
obsvector <- c(0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
matvector <- c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
immvector <- c(1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
propvector <- c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0)
indataset <- c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
binvec <- c(0, 4.6, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 
  0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5)

lathframeln <- sf_create(sizes = sizevector, stagenames = stagevector, 
  repstatus = repvector, obsstatus = obsvector, matstatus = matvector, 
  immstatus = immvector, indataset = indataset, binhalfwidth = binvec, 
  propstatus = propvector)

lathvertln <- verticalize3(lathyrus, noyears = 4, firstyear = 1988,
  patchidcol = "SUBPLOT", individcol = "GENET", blocksize = 9, 
  juvcol = "Seedling1988", sizeacol = "lnVol88", repstracol = "Intactseed88",
  fecacol = "Intactseed88", deadacol = "Dead1988", 
  nonobsacol = "Dormant1988", stageassign = lathframeln, stagesize = "sizea",
  censorcol = "Missing1988", censorkeep = NA, NAas0 = TRUE, censor = TRUE)

lathvertln$feca2 <- round(lathvertln$feca2)
lathvertln$feca1 <- round(lathvertln$feca1)
lathvertln$feca3 <- round(lathvertln$feca3)

lathvertln_adults <- subset(lathvertln, stage2index > 2)
surv_model <- glm(alive3 ~ sizea2 + sizea1 + as.factor(patchid) +
  as.factor(year2), data = lathvertln_adults, family = "binomial")

obs_data <- subset(lathvertln_adults, alive3 == 1)
obs_model <- glm(obsstatus3 ~ as.factor(patchid), data = obs_data,
  family = "binomial")

size_data <- subset(obs_data, obsstatus3 == 1)
siz_model <- lm(sizea3 ~ sizea2 + sizea1 + repstatus1 + as.factor(patchid) +
  as.factor(year2), data = size_data)

reps_model <- glm(repstatus3 ~ sizea2 + sizea1 + as.factor(patchid) +
  as.factor(year2), data = size_data, family = "binomial")

fec_data <- subset(lathvertln_adults, repstatus2 == 1)
fec_model <- glm(feca2 ~ sizea2 + sizea1 + repstatus1 + as.factor(patchid),
  data = fec_data, family = "poisson")

lathvertln_juvs <- subset(lathvertln, stage2index < 3)
jsurv_model <- glm(alive3 ~ as.factor(patchid), data = lathvertln_juvs,
  family = "binomial")

jobs_data <- subset(lathvertln_juvs, alive3 == 1)
jobs_model <- glm(obsstatus3 ~ 1, family = "binomial", data = jobs_data)

jsize_data <- subset(jobs_data, obsstatus3 == 1)
jsiz_model <- lm(sizea3 ~ as.factor(year2), data = jsize_data)

jrepst_model <- 0
jmatst_model <- 1

mod_params <- create_pm(name_terms = TRUE)
mod_params$modelparams[3] <- "patchid"
mod_params$modelparams[4] <- "alive3"
mod_params$modelparams[5] <- "obsstatus3"
mod_params$modelparams[6] <- "sizea3"
mod_params$modelparams[9] <- "repstatus3"
mod_params$modelparams[11] <- "feca2"
mod_params$modelparams[12] <- "sizea2"
mod_params$modelparams[13] <- "sizea1"
mod_params$modelparams[18] <- "repstatus2"
mod_params$modelparams[19] <- "repstatus1"

used_models <- list(survival_model = surv_model, observation_model = obs_model,
  size_model = siz_model, sizeb_model = 1, sizec_model = 1,
  repstatus_model = reps_model, fecundity_model = fec_model,
  juv_survival_model = jsurv_model, juv_observation_model = jobs_model,
  juv_size_model = jsiz_model, juv_sizeb_model = 1, juv_sizec_model = 1,
  juv_reproduction_model = 0, juv_maturity_model = 1, paramnames = mod_params)

lathsupp3 <- supplemental(stage3 = c("Sd", "Sd", "Sdl", "Sdl", "mat", "Sd", "Sdl"), 
  stage2 = c("Sd", "Sd", "Sd", "Sd", "Sdl", "rep", "rep"),
  stage1 = c("Sd", "rep", "Sd", "rep", "Sd", "mat", "mat"),
  eststage3 = c(NA, NA, NA, NA, "mat", NA, NA),
  eststage2 = c(NA, NA, NA, NA, "Sdl", NA, NA),
  eststage1 = c(NA, NA, NA, NA, "Sdl", NA, NA),
  givenrate = c(0.345, 0.345, 0.054, 0.054, NA, NA, NA),
  multiplier = c(NA, NA, NA, NA, NA, 0.345, 0.054),
  type = c(1, 1, 1, 1, 1, 3, 3), type_t12 = c(1, 2, 1, 2, 1, 1, 1),
  stageframe = lathframeln, historical = TRUE)

# While we do not use MPMs to initialize f_projections3(), we do use MPMs to
# initialize functions start_input() and density_input().
lathmat3ln <- mpm_create(historical = TRUE, year = "all", patch = "all",
  data = lathvertln, stageframe = lathframeln, supplement = lathsupp3,
  modelsuite = used_models, reduce = FALSE)

[Package lefko3 version 6.2.1 Index]