sampleVADIR {sampleVADIR}R Documentation

Draw stratified samples from VADIR database

Description

Core function used to pull a stratified sample from VADIR based on a variety of parameters.

Usage

sampleVADIR(
  data,
  n = 4500,
  vars = "all",
  rankDat = "rankDat",
  payRanks = 4,
  post911 = TRUE,
  dischargedAfter = FALSE,
  until = NULL,
  ageDischarge = TRUE,
  ageEnlist = FALSE,
  ageNow = FALSE,
  yearsServed = FALSE,
  dateformat = "%m/%d/%Y",
  params = NULL,
  formats = "default",
  typos = list(),
  rmDeviates = FALSE,
  timeCats = FALSE,
  saveData = TRUE,
  onlyIDs = FALSE,
  oversample = FALSE,
  exclude = FALSE,
  seed = NULL
)

Arguments

data

VADIR dataset

n

Total desired sample size

vars

Character vector indicating which variables to use in stratification

rankDat

Dataset linking ranks to pay grade, or character string indicating where to pull that dataset from. Recommended to leave as "rankDat" in order to use package-supplied dataset.

payRanks

Number of pay grades to use when converting rank variable. Only options are either 4 or 7.

post911

Logical. Determines whether to only consider individuals deployed after 9/11/2001

dischargedAfter

Character string indicating what date to restrict sampling to based on discharge date. Can set to FALSE if this is to be ignored. Can also set to 'past-year' in order to only sample people who were discharged within the past year (given the current date).

until

Upper limit to when service was started. NULL means there is no upper limit

ageDischarge

Logical. Determines whether to use age at discharge as a stratum.

ageEnlist

Logical. Determines whether to use age at enlist as a stratum.

ageNow

Logical. Determines whether to use current age as a stratum.

yearsServed

Logical. Determines whether to use total years served as a stratum.

dateformat

Character string indicating the expected date format. Should be automatically detected.

params

Optional list of parameters to override defaults in function. Creates an easy way to interface with the function if performing the stratification multiple times. Allows the user to avoid writing the same arguments multiple times.

formats

Should be "default"

typos

List containing typos to be fixed, as well as what they should be changed to. Leave at list() to ignore. Typos can also be fixed prior to stratification by using the fixTypos function.

rmDeviates

Logical. Determines whether rows with unexpected response values are removed. If FALSE, and deviate response values are detected, the function will stop.

timeCats

Logical or numeric. Determines whether the time-related variables should be treated as categorical variables. If TRUE, this defaults to 4.

saveData

Logical. Determines whether to save the full dataset in the output. Specifically, returns the full dataset of candidates (i.e., some people may be removed from consideration due to errors or unexpected responses).

onlyIDs

Logical. Determines whether to only return ID values for selected individuals rather than a full dataset.

oversample

Logical. Determines whether to oversample or undersample based on limitations due to available proportions of strata in subsample.

exclude

Logical. Determines whether to exclude people missing a zip code, as well as people with "NTC" as their zip code value.

seed

Numeric value indicating the seed to set for the stratification procedure. Allows for reproducible results.

Details

Performs stratification separately for males and females, where males and females are sampled at a 1:1 ratio, regardless of population ratio.

With a large dataset (which is typical for VADIR), setting any of the date-related variables to TRUE can drastically increase computation time. The relevant arguments include: ageDischarge, ageEnlist, ageNow, yearsServed.

Value

A list containing the males and females who were sampled from VADIR

Examples


params <- list(
  n = 7000,
  vars = c('PN_Sex_CD', 'PN_BRTH_DT', 'SVC_CD', 'PNL_CAT_CD', 'RANK_CD',
           'PNL_TERM_DT', 'PNL_BGN_DT', 'OMB_RACE_CD',
           'OMB_ETHNC_NAT_ORIG_CD', 'POST_911_DPLY_IND_CD'),
  rankDat = 'rankDat',
  payRanks = 4,
  post911 = FALSE,
  until = NULL,
  dischargedAfter = FALSE,
  ageDischarge = TRUE,
  ageEnlist = FALSE,
  ageNow = FALSE,
  yearsServed = FALSE,
  dateformat = '%m/%d/%Y',
  formats = 'default',
  rmDeviates = FALSE,
  timeCats = TRUE,
  saveData = TRUE,
  onlyIDs = FALSE,
  oversample = TRUE,
  exclude = FALSE,
  typos = list()
)

out <- sampleVADIR(VADIR_fake, params = params, seed = 19)


[Package sampleVADIR version 1.0.0 Index]