selectMatch {sitepickR} | R Documentation |
Two-level sample selection
Description
Carries out a two-level sample selection where the possibility of an initially selected site not wanting to participate is anticipated, and the site is optimally replaced. The procedure aims to reduce the bias (and/or loss of generalizability) with respect to the target population.
Usage
selectMatch(
df,
unitID,
subunitID,
subunitSampVars,
unitVars,
nUnitSamp,
nRepUnits,
nsubUnits,
exactMatchVars = NULL,
calipMatchVars = NULL,
calipValue = 0.2,
seedN = NA,
matchDistance = "mahalanobis",
sizeFlag = TRUE,
repFlag = TRUE,
writeOut = FALSE,
replacementUnitsFilename = "replacementUnits.csv",
subUnitTableFilename = "subUnitTable.csv"
)
Arguments
df |
dataframe; sub-unit level dataframe with both sub-unit and unit level variables |
unitID |
character; name of unit ID column |
subunitID |
character; name of sub-unit ID column |
subunitSampVars |
vector; column names of unit level variables to sample units on |
unitVars |
vector; column names of unit level variables to match units on |
nUnitSamp |
numeric; number of units to be initially randomly selected |
nRepUnits |
numeric; number of replacement units to find for each selected unit |
nsubUnits |
numeric; number of sub-units to be randomly selected for each unit |
exactMatchVars |
vector; column names of categorical variables on which units must be matched exactly. Must be present in 'unitVars'; default = NULL |
calipMatchVars |
vector; column names of continuous variables on which units must be matched within a specified caliper. Must be present in 'unitVars'; default = NULL |
calipValue |
numeric; number of standard deviations to be used as caliper for matching units on calipMatchVars |
seedN |
numeric; seed number to be used for sampling. If NA, calls set.seed(); default = NA |
matchDistance |
character; MatchIt distance parameter to obtain optimal matches (nearest neigboors); default = "mahalanois" |
sizeFlag |
logical; if TRUE, sampling is made proportional to unit size; default = TRUE |
repFlag |
logical; if TRUE, pick unit matches with/without repetition; default = TRUE |
writeOut |
logical; if TRUE, writes a .csv file for each output table; default = FALSE |
replacementUnitsFilename |
character; csv filename for saving unit:replacement directory when writeOut == TRUE; default = "replacementUnits.csv" |
subUnitTableFilename |
character; csv filename for saving unit:replacement directory when writeOut == TRUE; default = "subUnitTable.csv" |
Value
list with: 1) table of the form: selected unit i: (unit i replacements), 2) table of the form: potential unit i:(unit i sub-units), 3) balance diagnostics.
Examples
################################################################################
############## Prepare dataframe [sitepickR Package] ###########################
######### Robert Olsen, Elizabeth A. Stuart & Elena Badillo-Goicoechea (2022) ##
# Basic usage of selectMatch()
rawCCD <- sitepickR::rawCCD
uSampVarsCCD <- c("w.pct.frlunch", "w.pct.black", "w.pct.hisp", "w.pct.female")
suSampVarsCCD <- c("sch.pct.frlunch", "sch.pct.black", "sch.pct.hisp", "sch.pct.female")
dfCCD <- prepDF(rawCCD,
unitID="LEAID", subunitID="NCESSCH")
dfCCD <- dplyr::filter(dfCCD, unitID %in% unique(dfCCD$unitID)[1:80])
smOut <- selectMatch(df = dfCCD, # user dataset
unitID = "LEAID", # column name of unit ID in user dataset
subunitID = "NCESSCH", # column name of sub-unit ID in user dataset
unitVars = uSampVarsCCD, # name of unit level covariate columns
subunitSampVars = suSampVarsCCD, # name of sub-unit level covariate columns
nUnitSamp = 30,
nRepUnits = 5,
nsubUnits = 2
)