propensities {NonProbEst} | R Documentation |
Calculates sample propensities
Description
Given a convenience sample and a reference sample, computes estimates on the propensity to participate in the convenience sample based on classification models to be selected by the user.
Usage
propensities(
convenience_sample,
reference_sample,
covariates,
algorithm = "glm",
smooth = FALSE,
proc = NULL,
trControl = trainControl(classProbs = TRUE),
...
)
Arguments
convenience_sample |
Data frame containing the non-probabilistic sample. |
reference_sample |
Data frame containing the probabilistic sample. |
covariates |
String vector specifying the common variables to use for training. |
algorithm |
A string specifying which classification or regression model to use (same as caret's method). |
smooth |
A logical value; if TRUE, propensity estimates pi_i are smoothed applying the formula (1000*pi_i + 0.5)/1001 |
proc |
A string or vector of strings specifying if any of the data preprocessing techniques available in train function from 'caret' package should be applied to data prior to the propensity estimation. By default, its value is NULL and no preprocessing is applied. |
trControl |
A trainControl specifying the computational nuances of the train function. |
... |
Further parameters to be passed to the train function. |
Details
Training of the propensity estimation models is done via the 'caret' package. The algorithm specified in algorithm
must match one of the names in the list of algorithms supported by 'caret'. Case weights are used to balance classes (for models that accept them).
The smoothing formula for propensities avoids mathematical irregularities in the calculation of sample weight when an estimated propensity is 0 or 1. Further details can be found in Buskirk and Kolenikov (2015).
Value
A list containing 'convenience' propensities and 'reference' propensities.
References
Buskirk, T. D., & Kolenikov, S. (2015). Finding respondents in the forest: A comparison of logistic regression and random forest models for response propensity weighting and stratification. Survey Methods: Insights from the Field, 17.
Examples
#Simple example with default parameters
covariates = c("education_primaria", "education_secundaria")
propensities(sampleNP, sampleP, covariates)