R: Estimate total population size and capture probability using...

popsize {drpop}

R Documentation

Estimate total population size and capture probability using user provided set of models or user provided nuisance estimates.

Description

Estimate total population size and capture probability using user provided set of models or user provided nuisance estimates.

Usage

popsize(
  data,
  K = 2,
  j,
  k,
  margin = 0.005,
  filterrows = FALSE,
  nfolds = 5,
  funcname = c("rangerlogit"),
  sl.lib = c("SL.gam", "SL.glm", "SL.glm.interaction", "SL.ranger", "SL.glmnet"),
  getnuis,
  q1mat,
  q2mat,
  q12mat,
  idfold,
  TMLE = TRUE,
  PLUGIN = TRUE,
  Nmin = 100,
  ...
)

Arguments

`data`	The data frame in capture-recapture format with `K` lists for which total population is to be estimated. The first K columns are the capture history indicators for the `K` lists. The remaining columns are covariates in numeric format.
`K`	The number of lists that are present in the data.
`j`	The first list to be used for estimation.
`k`	The secod list to be used in the estimation.
`margin`	The minimum value the estimates can attain to bound them away from zero.
`filterrows`	A logical value denoting whether to remove all rows with only zeroes.
`nfolds`	The number of folds to be used for cross fitting.
`funcname`	The vector of estimation function names to obtain the population size.
`sl.lib`	Algorithm library for `qhat_sl()`. See `SuperLearner::listWrappers()`. Default library includes "gam", "glm", "glmnet", "glm.interaction", "ranger".
`getnuis`	A list object with the nuisance function estimates and the fold assignment of the rows for cross-fitting or a data.frame with the nuisance estimates.
`q1mat`	A dataframe with capture probabilities for the first list.
`q2mat`	A dataframe with capture probabilities for the second list.
`q12mat`	A dataframe with capture probabilities for both the lists simultaneously.
`idfold`	The fold assignment of each row during estimation.
`TMLE`	The logical value to indicate whether TMLE has to be computed.
`PLUGIN`	The logical value to indicate whether the plug-in estimates are returned.
`Nmin`	The cutoff for minimum sample size to perform doubly robust estimation. Otherwise, Petersen estimator is returned.
`...`	Any extra arguments passed into the function. See `qhat_rangerlogit()`, `qhat_sl()`, `tmle()`.

Value

A list of estimates containing the following components for each list-pair, model and method (PI = plug-in, DR = doubly-robust, TMLE = targeted maximum likelihood estimate):

`result`	A dataframe of the below estimated quantities. psi The estimated capture probability. sigma The efficiency bound. n The estimated population size n. sigman The estimated standard deviation of the population size. cin.l The estimated lower bound of a 95% confidence interval of `n`. cin.u The estimated upper bound of a 95% confidence interval of `n`.
`N`	The number of data points used in the estimation after removing rows with missing data.
`ifvals`	The estimated influence function values for the observed data.
`nuis`	The estimated nuisance functions (q12, q1, q2) for each element in funcname.
`nuistmle`	The estimated nuisance functions (q12, q1, q2) from tmle for each element in funcname.
`idfold`	The division of the rows into sets (folds) for cross-fitting.

References

Bickel, P. J., Klaassen, C. A., Bickel, P. J., Ritov, Y., Klaassen, J., Wellner, J. A., and Ritov, Y. (1993). Efficient and adaptive estimation for semiparametric models, volume 4. Johns Hopkins University Press Baltimore

van der Vaart, A. (2002a). Part iii: Semiparameric statistics. Lectures on Probability Theory and Statistics, pages 331-457

van der Laan, M. J. and Robins, J. M. (2003). Unified methods for censored longitudinal data and causality. Springer Science & Business Media

Tsiatis, A. (2006). Semiparametric theory and missing data springer. New York

Kennedy, E. H. (2016). Semiparametric theory and empirical processes in causal inference. Statistical causal inferences and their applications in public health research, pages 141-167. Springer

Das, M., Kennedy, E. H., & Jewell, N.P. (2021). Doubly robust capture-recapture methods for estimating population size. arXiv preprint arXiv:2104.14091.

Examples


data = simuldata(1000, l = 3)$data
qhat = popsize(data = data, funcname = c("logit", "gam"), nfolds = 2, margin = 0.005)
psin_estimate = popsize(data = data, getnuis = qhat$nuis, idfold = qhat$idfold)

data = simuldata(n = 6000, l = 3)$data
psin_estimate = popsize(data = data[,1:2])
#this returns the basic plug-in estimate since covariates are absent.

psin_estimate = popsize(data = data, funcname = c("gam", "rangerlogit"))

[Package drpop version 0.0.3 Index]