selectMatch {sitepickR}R Documentation

Two-level sample selection

Description

Carries out a two-level sample selection where the possibility of an initially selected site not wanting to participate is anticipated, and the site is optimally replaced. The procedure aims to reduce the bias (and/or loss of generalizability) with respect to the target population.

Usage

selectMatch(
  df,
  unitID,
  subunitID,
  subunitSampVars,
  unitVars,
  nUnitSamp,
  nRepUnits,
  nsubUnits,
  exactMatchVars = NULL,
  calipMatchVars = NULL,
  calipValue = 0.2,
  seedN = NA,
  matchDistance = "mahalanobis",
  sizeFlag = TRUE,
  repFlag = TRUE,
  writeOut = FALSE,
  replacementUnitsFilename = "replacementUnits.csv",
  subUnitTableFilename = "subUnitTable.csv"
)

Arguments

df

dataframe; sub-unit level dataframe with both sub-unit and unit level variables

unitID

character; name of unit ID column

subunitID

character; name of sub-unit ID column

subunitSampVars

vector; column names of unit level variables to sample units on

unitVars

vector; column names of unit level variables to match units on

nUnitSamp

numeric; number of units to be initially randomly selected

nRepUnits

numeric; number of replacement units to find for each selected unit

nsubUnits

numeric; number of sub-units to be randomly selected for each unit

exactMatchVars

vector; column names of categorical variables on which units must be matched exactly. Must be present in 'unitVars'; default = NULL

calipMatchVars

vector; column names of continuous variables on which units must be matched within a specified caliper. Must be present in 'unitVars'; default = NULL

calipValue

numeric; number of standard deviations to be used as caliper for matching units on calipMatchVars

seedN

numeric; seed number to be used for sampling. If NA, calls set.seed(); default = NA

matchDistance

character; MatchIt distance parameter to obtain optimal matches (nearest neigboors); default = "mahalanois"

sizeFlag

logical; if TRUE, sampling is made proportional to unit size; default = TRUE

repFlag

logical; if TRUE, pick unit matches with/without repetition; default = TRUE

writeOut

logical; if TRUE, writes a .csv file for each output table; default = FALSE

replacementUnitsFilename

character; csv filename for saving unit:replacement directory when writeOut == TRUE; default = "replacementUnits.csv"

subUnitTableFilename

character; csv filename for saving unit:replacement directory when writeOut == TRUE; default = "subUnitTable.csv"

Value

list with: 1) table of the form: selected unit i: (unit i replacements), 2) table of the form: potential unit i:(unit i sub-units), 3) balance diagnostics.

Examples

################################################################################
############## Prepare dataframe [sitepickR Package] ###########################
######### Robert Olsen, Elizabeth A. Stuart & Elena Badillo-Goicoechea (2022) ##

# Basic usage of selectMatch()

rawCCD <- sitepickR::rawCCD

uSampVarsCCD <- c("w.pct.frlunch", "w.pct.black", "w.pct.hisp", "w.pct.female") 
suSampVarsCCD <- c("sch.pct.frlunch", "sch.pct.black", "sch.pct.hisp", "sch.pct.female")

dfCCD <- prepDF(rawCCD,
                unitID="LEAID", subunitID="NCESSCH")
dfCCD <- dplyr::filter(dfCCD, unitID %in% unique(dfCCD$unitID)[1:80])

smOut <- selectMatch(df = dfCCD, # user dataset
                     unitID = "LEAID", # column name of unit ID in user dataset
                     subunitID = "NCESSCH", # column name of sub-unit ID in user dataset
                     unitVars = uSampVarsCCD, # name of unit level covariate columns
                     subunitSampVars = suSampVarsCCD, # name of sub-unit level covariate columns
                     nUnitSamp = 30,
                     nRepUnits = 5,
                     nsubUnits = 2
)

[Package sitepickR version 0.0.1 Index]