popsize_cond {drpop} R Documentation

## Estimate total population size and capture probability using user provided set of models conditioned on an attribute.

### Description

Estimate total population size and capture probability using user provided set of models conditioned on an attribute.

### Usage

popsize_cond(
data,
K = 2,
filterrows = FALSE,
funcname = c("rangerlogit"),
condvar,
nfolds = 2,
margin = 0.005,
sl.lib = c("SL.gam", "SL.glm", "SL.glm.interaction", "SL.ranger", "SL.glmnet"),
TMLE = TRUE,
PLUGIN = TRUE,
Nmin = 100,
...
)


### Arguments

 data The data frame in capture-recapture format for which total population is to be estimated. The first K columns are the capture history indicators for the K lists. The remaining columns are covariates in numeric format. K The number of lists in the data. typically the first K rows of data. filterrows A logical value denoting whether to remove all rows with only zeroes. funcname The vector of estimation function names to obtain the population size. condvar The covariate for which conditional estimates are required. nfolds The number of folds to be used for cross fitting. margin The minimum value the estimates can attain to bound them away from zero. sl.lib Algorithm library for qhat_sl(). See SuperLearner::listWrappers(). Default library includes "gam", "glm", "glmnet", "glm.interaction", "ranger". TMLE The logical value to indicate whether TMLE has to be computed. PLUGIN The logical value to indicate whether the plug-in estimates are returned. Nmin The cutoff for minimum sample size to perform doubly robust estimation. Otherwise, Petersen estimator is returned. ... Any extra arguments passed into the function. See qhat_rangerlogit(), qhat_sl(), tmle().

### Value

A list of estimates containing the following components for each list-pair, model and method (PI = plug-in, DR = doubly-robust, TMLE = targeted maximum likelihood estimate):

 result A dataframe of the below estimated quantities. psi The estimated capture probability. sigma The efficiency bound. n The estimated population size n. sigman The estimated standard deviation of the population size. cin.l The estimated lower bound of a 95% confidence interval of n. cin.u The estimated upper bound of a 95% confidence interval of n. N The number of data points used in the estimation after removing rows with missing data. ifvals The estimated influence function values for the observed data. nuis The estimated nuisance functions (q12, q1, q2) for each element in funcname. nuistmle The estimated nuisance functions (q12, q1, q2) from tmle for each element in funcname. idfold The division of the rows into sets (folds) for cross-fitting.

### References

Das, M., Kennedy, E. H., & Jewell, N.P. (2021). Doubly robust capture-recapture methods for estimating population size. arXiv preprint arXiv:2104.14091.

popsize

### Examples


data = simuldata(n = 10000, l = 2, categorical = TRUE)\$data

psin_estimate = popsize_cond(data = data, funcname = c("logit", "gam"),
condvar = 'catcov', PLUGIN = TRUE, TMLE = TRUE)
#this returns the plug-in, the bias-corrected and the tmle estimate for the
#two models conditioned on column catcov



[Package drpop version 0.0.3 Index]