best_split {optrefine}R Documentation

Find the best split for a stratum

Description

Runs split_stratum() many times and selects the best result.

Usage

best_split(
  z,
  X,
  strata,
  ist,
  nc_list,
  nt_list,
  wMax = 5,
  wEach = 1,
  solver = "Rglpk",
  integer = FALSE,
  min_split = 10,
  threads = threads
)

Arguments

z

Vector of treatment assignment

X

Covariate matrix or data.frame

strata

vector of initial strata assignments; only used if object is not supplied. Can be NULL, in which case an initial stratification using the quintiles of the propensity score is generated using prop_strat() and the generated propensity score is also added to the X matrix as an extra covariate

ist

the stratum to be split

nc_list

a list of choices for the nc parameter in split_stratum(). Each element is a vector with entries corresponding to the number of control units that should be placed in each new stratum

nt_list

a list of choices for the nt parameter in split_stratum(). Each element is a vector with entries corresponding to the number of treated units that should be placed in each new stratum

wMax

the weight the objective places on the maximum epsilon

wEach

the weight the objective places on each epsilon

solver

character specifying the optimization software to use. Options are "Rglpk" or "gurobi". The default is "gurobi"

integer

boolean whether to use integer programming instead of randomized rounding. Default is FALSE. It is not recommended to set this to TRUE as the problem may never finish

min_split

a numeric specifying the minimum number of each control and treated units to be tolerated in a stratum. Any combination of elements from nc_list and nt_list that violate this are skipped

threads

how many threads to use in the optimization if using "gurobi" as the solver. Default will use all available threads

Value

A list containing the following elements:

Examples


# Generate a small data set
set.seed(25)
samp <- sample(1:nrow(rhc_X), 1000)
cov_samp <- sample(1:26, 10)

# Create some strata
ps <- prop_strat(z = rhc_X[samp, "z"],
                 X = rhc_X[samp, cov_samp], nstrata = 5)

# Save the sample sizes
tab <- table(ps$z, ps$base_strata)

# Choose the best sample sizes among the options provided
best_split(z = ps$z, X = ps$X, strata = ps$base_strata, ist = 1,
           nc_list = list(c(floor(tab[1, 1] * 0.25), ceiling(tab[1, 1] * 0.75)),
                          c(floor(tab[1, 1] * 0.4), ceiling(tab[1, 1] * 0.6))),
           nt_list = list(c(floor(tab[2, 1] * 0.3), ceiling(tab[2, 1] * 0.7))),
           min_split = 5)

[Package optrefine version 1.1.0 Index]