auto_stratify {stratamatch}R Documentation

Auto Stratify

Description

Automatically creates strata for matching based on a prognostic score formula or a vector of prognostic scores already estimated by the user. Creates a auto_strata object, which can be passed to strata_match for stratified matching or unpacked by the user to be matched by some other means.

Usage

auto_stratify(
  data,
  treat,
  prognosis,
  outcome = NULL,
  size = 2500,
  pilot_fraction = 0.1,
  pilot_size = NULL,
  pilot_sample = NULL,
  group_by_covariates = NULL
)

Arguments

data

data.frame with observations as rows, features as columns

treat

string giving the name of column designating treatment assignment

prognosis

information on how to build prognostic scores. Three different input types are allowed:

  1. vector of prognostic scores for all individuals in the data set. Should be in the same order as the rows of data.

  2. a formula for fitting a prognostic model

  3. an already-fit prognostic score model

outcome

string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula

size

numeric, desired size of strata (default = 2500)

pilot_fraction

numeric between 0 and 1 giving the proportion of controls to be allotted for building the prognostic score (default = 0.1)

pilot_size

alternative to pilot_fraction. Approximate number of observations to be used in pilot set. Note that the actual pilot set size returned may not be exactly pilot_size if group_by_covariates is specified because balancing by covariates may result in deviations from desired size. If pilot_size is specified, pilot_fraction is ignored.

pilot_sample

a data.frame of held aside samples for building prognostic score model. If pilot_sample is specified, pilot_size and pilot_fraction are both ignored.

group_by_covariates

character vector giving the names of covariates to be grouped by (optional). If specified, the pilot set will be sampled in a stratified manner, so that the composition of the pilot set reflects the composition of the whole data set in terms of these covariates. The specified covariates must be categorical.

Details

Stratifying by prognostic score quantiles can be more effective than manually stratifying a data set because the prognostic score is continuous, thus the strata produced tend to be of equal size with similar prognosis.

Automatic stratification requires information on how the prognostic scores should be derived. This is primarily determined by the specifciation of the prognosis argument. Three main forms of input for prognosis are allowed:

  1. A vector of prognostic scores. This vector should be the same length and order of the rows in the data set. If this method is used, the outcome argument must also be specified; this is simply a string giving the name of the column which contains outcome information.

  2. A formula for prognosis (e.g. outcome ~ X1 + X2). If this method is used, auto_stratify will automatically split the data set into a pilot_set and an analysis_set. The pilot set will be used to fit a logistic regression model for outcome in the absence of treatment, and this model will be used to estimate prognostic scores on the analysis set. The analysis set will then be stratified based on the estimated prognostic scores. In this case the outcome argument need not be specified since it can be inferred from the input formula.

  3. A model for prognosis (e.g. a glm object). If this method is used, the outcome argument must also be specified

Value

Returns an auto_strata object. This contains:

Troubleshooting

This section suggests fixes for common errors that appear while fitting the prognostic score or using it to estimate prognostic scores on the analysis set.

Other errors or warnings can occur if the pilot set is too small and the prognostic formula is too complicated. Always make sure that the number of observations in the pilot set is large enough that you can confidently fit a prognostic model with the number of covariates you want.

See Also

manual_stratify, new_auto_strata

Examples

# make sample data set
set.seed(111)
dat <- make_sample_data(n = 75)

# construct a pilot set, build a prognostic score for `outcome` based on X2
# and stratify the data set based on the scores into sets of about 25
# observations
a.strat_formula <- auto_stratify(dat, "treat", outcome ~ X2, size = 25)

# stratify the data set based on a model for prognosis
pilot_data <- make_sample_data(n = 30)
prognostic_model <- glm(outcome ~ X2, pilot_data, family = "binomial")
a.strat_model <- auto_stratify(dat, "treat", prognostic_model,
  outcome = "outcome", size = 25
)

# stratify the data set based on a vector of prognostic scores
prognostic_scores <- predict(prognostic_model,
  newdata = dat,
  type = "response"
)
a.strat_scores <- auto_stratify(dat, "treat", prognostic_scores,
  outcome = "outcome", size = 25
)

# diagnostic plots
plot(a.strat_formula)
plot(a.strat_formula, type = "AC", propensity = treat ~ X1, stratum = 1)
plot(a.strat_formula, type = "hist", propensity = treat ~ X1, stratum = 1)
plot(a.strat_formula, type = "residual")

[Package stratamatch version 0.1.9 Index]