chao {singleRcapture}R Documentation

Family functions in singleRcapture package

Description

Package singleRcapture utilizes various family type functions that specify variable parts of population size estimation, regression, diagnostics and other necessary information that depends on the model. These functions are used as model argument in estimatePopsize function.

Usage

chao(lambdaLink = "loghalf", ...)

Hurdleztgeom(
  lambdaLink = c("log", "neglog"),
  piLink = c("logit", "cloglog", "probit"),
  ...
)

Hurdleztnegbin(
  nSim = 1000,
  epsSim = 1e-08,
  eimStep = 6,
  lambdaLink = c("log", "neglog"),
  alphaLink = c("log", "neglog"),
  piLink = c("logit", "cloglog", "probit"),
  ...
)

Hurdleztpoisson(
  lambdaLink = c("log", "neglog"),
  piLink = c("logit", "cloglog", "probit"),
  ...
)

oiztgeom(
  lambdaLink = c("log", "neglog"),
  omegaLink = c("logit", "cloglog", "probit"),
  ...
)

oiztnegbin(
  nSim = 1000,
  epsSim = 1e-08,
  eimStep = 6,
  lambdaLink = c("log", "neglog"),
  alphaLink = c("log", "neglog"),
  omegaLink = c("logit", "cloglog", "probit"),
  ...
)

oiztpoisson(
  lambdaLink = c("log", "neglog"),
  omegaLink = c("logit", "cloglog", "probit"),
  ...
)

zelterman(lambdaLink = "loghalf", ...)

zotgeom(lambdaLink = c("log", "neglog"), ...)

zotnegbin(
  nSim = 1000,
  epsSim = 1e-08,
  eimStep = 6,
  lambdaLink = c("log", "neglog"),
  alphaLink = c("log", "neglog"),
  ...
)

zotpoisson(lambdaLink = c("log", "neglog"), ...)

ztHurdlegeom(
  lambdaLink = c("log", "neglog"),
  piLink = c("logit", "cloglog", "probit"),
  ...
)

ztHurdlenegbin(
  nSim = 1000,
  epsSim = 1e-08,
  eimStep = 6,
  lambdaLink = c("log", "neglog"),
  alphaLink = c("log", "neglog"),
  piLink = c("logit", "cloglog", "probit"),
  ...
)

ztHurdlepoisson(
  lambdaLink = c("log", "neglog"),
  piLink = c("logit", "cloglog", "probit"),
  ...
)

ztgeom(lambdaLink = c("log", "neglog"), ...)

ztnegbin(
  nSim = 1000,
  epsSim = 1e-08,
  eimStep = 6,
  lambdaLink = c("log", "neglog"),
  alphaLink = c("log", "neglog"),
  ...
)

ztoigeom(
  lambdaLink = c("log", "neglog"),
  omegaLink = c("logit", "cloglog", "probit"),
  ...
)

ztoinegbin(
  nSim = 1000,
  epsSim = 1e-08,
  eimStep = 6,
  lambdaLink = c("log", "neglog"),
  alphaLink = c("log", "neglog"),
  omegaLink = c("logit", "cloglog", "probit"),
  ...
)

ztoipoisson(
  lambdaLink = c("log", "neglog"),
  omegaLink = c("logit", "cloglog", "probit"),
  ...
)

ztpoisson(lambdaLink = c("log", "neglog"), ...)

Arguments

lambdaLink

link for Poisson parameter, "log" by default except for zelterman's and chao's models where only \(\ln\left(\frac{x}{2}\right)\) is possible.

...

Additional arguments, not used for now.

piLink

link for probability parameter, "logit" by default

nSim, epsSim

if working weights cannot be computed analytically these arguments specify maximum number of simulations allowed and precision level for finding them numerically respectively.

eimStep

a non negative integer describing how many values should be used at each step of approximation of information matrixes when no analytic solution is available (e.g. "ztnegbin"), default varies depending on a function. Higher value usually means faster convergence but may potentially cause issues with convergence.

alphaLink

link for dispersion parameter, "log" by default

omegaLink

link for inflation parameter, "logit" by default

Details

Most of these functions are based on some "base" distribution with support \(\mathbb{N}_{0}=\mathbb{N}\cup\lbrace 0\rbrace\) that describe distribution of \(Y\) before truncation. Currently they include: \[\mathbb{P}(Y=y|\lambda,\alpha)=\left\lbrace \begin{array}{cc} \frac{\lambda^{y}e^{-\lambda}}{y!} & \text{Poisson distribution} \cr \frac{\Gamma(y+\alpha^{-1})}{\Gamma(\alpha^{-1})y!} \left(\frac{\alpha^{-1}}{\alpha^{-1}+\lambda}\right)^{\alpha^{-1}} \left(\frac{\lambda}{\alpha^{-1}+\lambda}\right)^{y} & \text{negative binomial distribution} \cr \frac{\lambda^{y}}{(1+\lambda)^{y+1}} & \text{geometric distribution} \end{array} \right.\] where \(\lambda\) is the Poisson parameter and \(\alpha\) is the dispersion parameter. Geometric distribution is a special case of negative binomial distribution when \(\alpha=1\) it is included because negative binomial distribution is quite troublesome numerical regression in fitting. It is important to know that PMF of negative binomial distribution approaches the PMF of Poisson distribution when \(\alpha\rightarrow 0^{+}\).

Note in literature on single source capture recapture models the dispersion parameter which introduces greater variability in negative binomial distribution compared to Poisson distribution is generally interpreted as explaining the unobserved heterogeneity i.e. presence of important unobserved independent variables. All these methods for estimating population size are tied to Poisson processes hence we use \(\lambda\) as parameter symbol instead of \(\mu\) to emphasize this connection. Also will not be hard to see that all estimators derived from modifying the "base" distribution are unbiased if assumptions made by respective models are not violated.

The zero truncated models corresponding to "base" distributions are characterized by relation: \[\mathbb{P}(Y=y|Y>0)=\left\lbrace \begin{array}{cc} \frac{\mathbb{P}(Y=y)}{1-\mathbb{P}(Y=0)} & \text{when }y\neq 0 \cr 0 & \text{when }y=0 \end{array}\right.\] which allows us to estimate parameter values using only observed part of population. These models lead to the following estimates, respectively: \[ \begin{aligned} \hat{N} &= \sum_{k=1}^{N_{obs}}\frac{1}{1-\exp(-\lambda_{k})} & \text{ For Poisson distribution} \cr \hat{N} &= \sum_{k=1}^{N_{obs}}\frac{1}{1-(1+\alpha_{k}\lambda_{k})^{-\alpha_{k}^{-1}}} & \text{ For negative binomial distribution} \cr \hat{N} &= \sum_{k=1}^{N_{obs}}\frac{1+\lambda_{k}}{\lambda_{k}} & \text{ For geometric distribution} \end{aligned} \]

One common way in which assumptions of zero truncated models are violated is presence of one inflation the presence of which is somewhat similar in single source capture-recapture models to zero inflation in usual count data analysis. There are two ways in which one inflation may be understood, they relate to whether \(\mathbb{P}(Y=0)\) is modified by inflation. The first approach is inflate (\(\omega\) parameter) zero truncated distribution as: \[ \mathbb{P}_{new}(Y=y|Y>0) = \left\lbrace\begin{array}{cc} \omega + (1 - \omega)\mathbb{P}_{old}(Y=1|Y>0)& \text{when: } y = 1 \cr (1 - \omega) \mathbb{P}_{old}(Y=y|Y>0) & \text{when: } y \neq 1 \end{array}\right.\] which corresponds to: \[ \mathbb{P}_{new}(Y=y) = \left\lbrace\begin{array}{cc} \mathbb{P}_{old}(Y=0) & \text{when: } y = 0 \cr \omega(1 - \mathbb{P}(Y=0)) + (1 - \omega)\mathbb{P}_{old}(Y=1) & \text{when: } y = 1 \cr (1 - \omega) \mathbb{P}_{old}(Y=y) & \text{when: } y > 1 \end{array}\right. \] before zero truncation. Models that utilize this approach are commonly referred to as zero truncated one inflated models. Another way of accommodating one inflation in SSCR is by putting inflation parameter on base distribution as: \[ \mathbb{P}_{new}(Y=y) = \left\lbrace\begin{array}{cc} \omega + (1 - \omega)\mathbb{P}_{old}(Y=1)& \text{when: } y = 1 \cr (1 - \omega) \mathbb{P}_{old}(Y=y) & \text{when: } y \neq 1 \end{array}\right. \] which then becomes: \[ \mathbb{P}_{new}(Y=y|Y>0) = \left\lbrace\begin{array}{cc} \frac{\omega}{1 - (1-\omega)\mathbb{P}_{old}(Y=0)} + \frac{(1 - \omega)}{1 - (1-\omega)\mathbb{P}_{old}(Y=0)}\mathbb{P}_{old}(Y=1)& \text{when: } y = 1 \cr \frac{(1 - \omega)}{1 - (1-\omega)\mathbb{P}_{old}(Y=0)}\mathbb{P}_{old}(Y=y) & \text{when: } y > 1 \end{array}\right. \] after truncation. It was shown by Böhning in 2022 paper that these approaches are equivalent in terms of maximizing likelihoods if we do not put formula on \(\omega\). They can however lead to different population size estimates.

For zero truncated one inflated models the formula for population size estimate \(\hat{N}\) does not change since \(\mathbb{P}(y=0)\) remains the same but estimation of parameters changes all calculations.

For one inflated zero truncated models population size estimates are expressed, respectively by: \[ \begin{aligned} \hat{N} &= \sum_{k=1}^{N_{obs}}\frac{1}{1-(1-\omega_{k})\exp(-\lambda_{k})} &\text{ For base Poisson distribution} \cr \hat{N} &= \sum_{k=1}^{N_{obs}}\frac{1}{1-(1-\omega_{k})(1+\alpha_{k}\lambda_{k})^{-\alpha_{k}^{-1}}} &\text{ For base negative binomial distribution} \cr \hat{N} &= \sum_{k=1}^{N_{obs}}\frac{1+\lambda_{k}}{\lambda_{k} + \omega_{k}} &\text{ For base geometric distribution} \end{aligned} \]

Zero one truncated models ignore one counts instead of accommodating one inflation by utilizing the identity \[ \ell_{\text{ztoi}}=\boldsymbol{f}_{1}\ln{\frac{\boldsymbol{f}_{1}}{N_{obs}}} +(N_{obs}-\boldsymbol{f}_{1})\ln{\left(1-\frac{\boldsymbol{f}_{1}}{N_{obs}} \right)} + \ell_{\text{zot}} \] where \(\ell_{\text{zot}}\) is the log likelihood of zero one truncated distribution characterized by probability mass function: \[\mathbb{P}(Y=y|Y>1)=\left\lbrace \begin{array}{cc} \frac{\mathbb{P}(Y=y)}{1-\mathbb{P}(Y=0)-\mathbb{P}(Y=1)} & \text{when }y > 1 \cr 0 & \text{when }y\in\lbrace 0, 1\rbrace \end{array}\right.\] where \(\mathbb{P}(Y)\) is the probability mass function of the "base" distribution. The identity above justifies use of zero one truncated, unfortunately it was only proven for intercept only models, however numerical simulations seem to indicate that even if the theorem cannot be extended for (non trivial) regression population size estimation is still possible.

For zero one truncated models population size estimates are expressed by: \[ \begin{aligned} \hat{N} &= \boldsymbol{f}_{1} + \sum_{k=1}^{N_{obs}} \frac{1-\lambda_{k}\exp(-\lambda_{k})}{1-\exp(-\lambda_{k})-\lambda_{k}\exp(-\lambda_{k})} &\text{ For base Poisson distribution} \cr \hat{N} &= \boldsymbol{f}_{1} + \sum_{k=1}^{N_{obs}} \frac{1-\lambda_{k}(1+\alpha_{k}\lambda_{k})^{-1-\alpha_{k}^{-1}}}{ 1-(1+\alpha_{k}\lambda_{k})^{-\alpha_{k}^{-1}}-\lambda_{k}(1+\alpha_{k}\lambda_{k})^{-1-\alpha_{k}^{-1}}} &\text{ For base negative binomial distribution} \cr \hat{N} &= \boldsymbol{f}_{1} + \sum_{k=1}^{N_{obs}} \frac{\lambda_{k}^{2}+\lambda_{k}+1}{\lambda_{k}^{2}} &\text{ For base geometric distribution} \end{aligned} \]

Pseudo hurdle models are experimental and not yet described in literature.

Lastly there are chao and zelterman models which are based on logistic regression on the dummy variable \[ Z = \left\lbrace\begin{array}{cc} 0 & \text{if }Y = 1 \cr 1 & \text{if }Y = 2 \end{array}\right.\] based on the equation: \[ \text{logit}(p_{k})= \ln\left(\frac{\lambda_{k}}{2}\right)= \boldsymbol{\beta}\mathbf{x}_{k}=\eta_{k}\] where \(\lambda_{k}\) is the Poisson parameter.

The zelterman estimator of population size is expressed as: \[\hat{N}=\sum_{k=1}^{N_{obs}}{1-\exp\left(-\lambda_{k}\right)}\] and chao estimator has the form: \[ \hat{N}=N_{obs}+\sum_{k=1}^{\boldsymbol{f}_{1}+\boldsymbol{f}_{2}} \frac{1}{\lambda_{k}+ \frac{\lambda_{k}^{2}}{2}} \]

Value

A object of class family containing objects:

Author(s)

Piotr Chlebicki, Maciej Beręsewicz

See Also

estimatePopsize()


[Package singleRcapture version 0.2.1.2 Index]