popsize {drpop}  R Documentation 
Estimate total population size and capture probability using user provided set of models or user provided nuisance estimates.
popsize(
data,
K = 2,
j,
k,
margin = 0.005,
filterrows = FALSE,
nfolds = 5,
funcname = c("rangerlogit"),
sl.lib = c("SL.gam", "SL.glm", "SL.glm.interaction", "SL.ranger", "SL.glmnet"),
getnuis,
q1mat,
q2mat,
q12mat,
idfold,
TMLE = TRUE,
PLUGIN = TRUE,
Nmin = 100,
...
)
data 
The data frame in capturerecapture format with 
K 
The number of lists that are present in the data. 
j 
The first list to be used for estimation. 
k 
The secod list to be used in the estimation. 
margin 
The minimum value the estimates can attain to bound them away from zero. 
filterrows 
A logical value denoting whether to remove all rows with only zeroes. 
nfolds 
The number of folds to be used for cross fitting. 
funcname 
The vector of estimation function names to obtain the population size. 
sl.lib 
Algorithm library for 
getnuis 
A list object with the nuisance function estimates and the fold assignment of the rows for crossfitting or a data.frame with the nuisance estimates. 
q1mat 
A dataframe with capture probabilities for the first list. 
q2mat 
A dataframe with capture probabilities for the second list. 
q12mat 
A dataframe with capture probabilities for both the lists simultaneously. 
idfold 
The fold assignment of each row during estimation. 
TMLE 
The logical value to indicate whether TMLE has to be computed. 
PLUGIN 
The logical value to indicate whether the plugin estimates are returned. 
Nmin 
The cutoff for minimum sample size to perform doubly robust estimation. Otherwise, Petersen estimator is returned. 
... 
Any extra arguments passed into the function. See 
A list of estimates containing the following components for each listpair, model and method (PI = plugin, DR = doublyrobust, TMLE = targeted maximum likelihood estimate):
result 
A dataframe of the below estimated quantities.

N 
The number of data points used in the estimation after removing rows with missing data. 
ifvals 
The estimated influence function values for the observed data. 
nuis 
The estimated nuisance functions (q12, q1, q2) for each element in funcname. 
nuistmle 
The estimated nuisance functions (q12, q1, q2) from tmle for each element in funcname. 
idfold 
The division of the rows into sets (folds) for crossfitting. 
Bickel, P. J., Klaassen, C. A., Bickel, P. J., Ritov, Y., Klaassen, J., Wellner, J. A., and Ritov, Y. (1993). Efficient and adaptive estimation for semiparametric models, volume 4. Johns Hopkins University Press Baltimore
van der Vaart, A. (2002a). Part iii: Semiparameric statistics. Lectures on Probability Theory and Statistics, pages 331457
van der Laan, M. J. and Robins, J. M. (2003). Unified methods for censored longitudinal data and causality. Springer Science & Business Media
Tsiatis, A. (2006). Semiparametric theory and missing data springer. New York
Kennedy, E. H. (2016). Semiparametric theory and empirical processes in causal inference. Statistical causal inferences and their applications in public health research, pages 141167. Springer
Das, M., Kennedy, E. H., & Jewell, N.P. (2021). Doubly robust capturerecapture methods for estimating population size. arXiv preprint arXiv:2104.14091.
data = simuldata(1000, l = 3)$data
qhat = popsize(data = data, funcname = c("logit", "gam"), nfolds = 2, margin = 0.005)
psin_estimate = popsize(data = data, getnuis = qhat$nuis, idfold = qhat$idfold)
data = simuldata(n = 6000, l = 3)$data
psin_estimate = popsize(data = data[,1:2])
#this returns the basic plugin estimate since covariates are absent.
psin_estimate = popsize(data = data, funcname = c("gam", "rangerlogit"))