R: Covariate-constrained randomization for cluster randomized...

cvrall {cvcrand}

R Documentation

Covariate-constrained randomization for cluster randomized trials

Description

cvrall performs constrained randomization for cluster randomized trials (CRTs), especially suited for CRTs with a small number of clusters. In constrained randomization, a randomization scheme is randomly sampled from a subset of all possible randomization schemes based on the value of a balancing criterion called a balance score. The cvrall function has two choices of "l1" and "l2" metrics for balance score.

The cvrall function enumerates all randomization schemes or chooses the unique ones among some simulated randomization schemes as specified by the user. Some cluster-level continuous or categorical covariates are then used to calculate the balance scores for the unique schemes. A subset of the randomization schemes is chosen based on a user-specified cutoff at a certain quantile of the distribution of the balance scores or based on a fixed number of schemes with the smallest balance scores. The cvrall function treats the subset as the constrained space of randomization schemes and samples one scheme from the constrained space as the final chosen scheme.

Usage

cvrall(
  clustername = NULL,
  x,
  categorical = NULL,
  weights = NULL,
  ntotal_cluster,
  ntrt_cluster,
  cutoff = 0.1,
  numschemes = NULL,
  size = 50000,
  stratify = NULL,
  seed = NULL,
  balancemetric = "l2",
  nosim = FALSE,
  savedata = NULL,
  bhist = TRUE,
  check_validity = FALSE,
  samearmhi = 0.75,
  samearmlo = 0.25
)

Arguments

`clustername`	a vector specifying the identification variable of the cluster. If no cluster identification variable is specified, the default is to label the clusters based on the order in which they appear.
`x`	a data frame specifying the values of cluster-level covariates to balance. With K covariates and n clusters, it will be dimension of `n` by `K`.
`categorical`	a vector specifying categorical (including binary) variables. This can be names of the columns or number indexes of columns, but cannot be both. Suppose there are `p` categories for a categorical variable, `cvcrand` function creates `p-1` dummy variables and drops the reference level if the variable is specified as a factor. Otherwise, the first level in the alphanumerical order will be dropped. The results are sensitive to which level is excluded. If the user wants to specify a different level to drop for a `p`-level categorical variable, the user can create `p-1` dummy variables and these can instead be supplied as covariates to the `cvcrand` function. Then, the user needs to specify the dummy variables created to be `categorical` when running `cvcrand`. In addition, the user could also set the variable as a factor with the specific reference level. If the `weights` option is used, the weights for a categorical variable will be replicated on all the dummy variables created.
`weights`	a vector of user-specified weights for the covariates to calculate the balance score. The weight for a categorical variable will be replicated for the dummy variables created. Note that the `weights` option can be used to conduct stratification on variables. For example, a variable with a relatively large weight like `1000` and all other variables with a weight of `1` will cause the randomization scheme chosen to be stratified by the variable with the large weight, assuming a low `cutoff` value is specified.
`ntotal_cluster`	the total number of clusters to be randomized. It must be a positive integer and equal to the number of rows of the data.
`ntrt_cluster`	the number of clusters that the researcher wants to assign to the treatment arm. It must be a positive integer less than the total number of clusters.
`cutoff`	quantile cutoff of the distribution of balance score below which a randomization scheme is sampled. Its default is `0.1`, and it must be between 0 and 1. The `cutoff` option is overridden by the `numschemes` option.
`numschemes`	number of randomization schemes to form the constrained space for the final randomization scheme to be selected. If specified, it overrides the option `cutoff` and the program will randomly sample the final randomization scheme from the constrained space of randomization schemes with the `numschemes` smallest balance scores. It must be a positive integer.
`size`	number of randomization schemes to simulate if the number of all possible randomization schemes is over `size`. Its default is `50,000`, and must be a positive integer. It can be overridden by the `nosim` option.
`stratify`	categorical variables on which to stratify the randomization. It overrides the option `weights` when specified. This list of categorical variables should be a subset of the `categorical` option if specified.
`seed`	seed for simulation and random sampling. It is needed so that the randomization can be replicated. Its default is `12345`.
`balancemetric`	balance metric to use. Its choices are `"l1"` and `"l2"`. The default is `"l2"`.
`nosim`	if `TRUE`, it overrides the default procedure of simulating when the number of all possible randomization schemes is over the `size`, and the program enumerates all randomization schemes. Note: this may consume a lot of memory and cause R to crash
`savedata`	saves the data set of the constrained randomization space in a csv file if specified by `savedata`. The first column of the csv file is an indicator variable of the final randomization scheme in the constrained space. The constrained randomization space will be needed for analysis after the cluster randomized trial is completed if the clustered permutation test is used.
`bhist`	if `TRUE` of the default value, it produces the histogram of all balance scores with a red line on the graph indicating the selected cutoff.
`check_validity`	boolean argument to check the randomization validity or not
`samearmhi`	clusters assigned to the same arm as least this often are displayed. The default is `0.75`.
`samearmlo`	clusters assigned to the same arm at most this often are displayed. The default is `0.25`.

Value

balancemetric the balance metric used

allocation the allocation scheme from constrained randomization

bscores the histogram of the balance score with respect to the balance metric

assignment_message the statement about how many clusters to be randomized to the intervention and the control arms respectively

scheme_message the statement about how to get the whole randomization space to use in constrained randomization

cutoff_message the statement about the cutoff in the constrained space

choice_message the statement about the selected scheme from constrained randomization

data_CR the data frame containing the allocation scheme, the clustername, and the original data frame of covariates

baseline_table the descriptive statistics for all the variables by the two arms from the selected scheme

cluster_coincidence cluster coincidence matrix

cluster_coin_des cluster coincidence descriptive

clusters_always_pair pairs of clusters always allocated to the same arm.

clusters_always_not_pair pairs of clusters always allocated to different arms.

clusters_high_pair pairs of clusters randomized to the same arm at least samearmhi of the time.

clusters_low_pair pairs of clusters randomized to the same arm at most samearmlo of the time.

overall_allocations frequency of acceptable overall allocations.

Author(s)

Hengshi Yu <hengshi@umich.edu>, Fan Li <fan.f.li@yale.edu>, John A. Gallis <john.gallis@duke.edu>, Elizabeth L. Turner <liz.turner@duke.edu>

References

Raab, G.M. and Butcher, I., 2001. Balance in cluster randomized trials. Statistics in medicine, 20(3), pp.351-365.

Li, F., Lokhnygina, Y., Murray, D.M., Heagerty, P.J. and DeLong, E.R., 2016. An evaluation of constrained randomization for the design and analysis of group randomized trials. Statistics in medicine, 35(10), pp.1565-1579.

Li, F., Turner, E. L., Heagerty, P. J., Murray, D. M., Vollmer, W. M., & DeLong, E. R. (2017). An evaluation of constrained randomization for the design and analysis of group randomized trials with binary outcomes. Statistics in medicine, 36(24), 3791-3806.

Gallis, J.A., Li, F., Yu, H. and Turner, E.L., 2018. cvcrand and cptest: Commands for efficient design and analysis of cluster randomized trials using constrained randomization and permutation tests. The Stata Journal, 18(2), pp.357-378.

Dickinson, L. M., Beaty, B., Fox, C., Pace, W., Dickinson, W. P., Emsermann, C., & Kempe, A. (2015). Pragmatic cluster randomized trials using covariate constrained randomization: A method for practice-based research networks (PBRNs). The Journal of the American Board of Family Medicine, 28(5), 663-672.

Bailey, R.A. and Rowley, C.A., 1987. Valid randomization. Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences, 410(1838), pp.105-124.

Examples



# cvrall examples

Design_result <- cvrall(clustername = Dickinson_design$county,
                         balancemetric = "l2",
                         x = data.frame(Dickinson_design[ , c("location", "inciis",
                              "uptodateonimmunizations", "hispanic", "incomecat")]),
                         ntotal_cluster = 16,
                         ntrt_cluster = 8,
                         categorical = c("location", "incomecat"),
                         ###### Option to save the constrained space ######
                         # savedata = "dickinson_constrained.csv",
                         bhist = TRUE,
                         cutoff = 0.1,
                         seed = 12345, 
                         check_validity = TRUE)

# cvrall example with weights specified

Design_result <- cvrall(clustername = Dickinson_design$county,
                         balancemetric = "l2",
                         x = data.frame(Dickinson_design[ , c("location", "inciis",
                             "uptodateonimmunizations", "hispanic", "incomecat")]),
                         ntotal_cluster = 16,
                         ntrt_cluster = 8,
                         categorical = c("location", "incomecat"),
                         weights = c(1, 1, 1, 1, 1),
                         cutoff = 0.1,
                         seed = 12345, 
                         check_validity = TRUE)

# Stratification on location, with constrained
# randomization on other specified covariates

 Design_stratified_result <- cvrall(clustername = Dickinson_design$county,
                                     balancemetric = "l2",
                                     x = data.frame(Dickinson_design[ , c("location", "inciis",
                                         "uptodateonimmunizations", "hispanic", "incomecat")]),
                                     ntotal_cluster = 16,
                                     ntrt_cluster = 8,
                                     categorical = c("location", "incomecat"),
                                     weights = c(1000, 1, 1, 1, 1),
                                     cutoff = 0.1,
                                     seed = 12345)

 # An alternative and equivalent way to stratify on location

 Design_stratified_result <- cvrall(clustername = Dickinson_design$county,
                                     balancemetric = "l2",
                                     x = data.frame(Dickinson_design[ , c("location", "inciis",
                                         "uptodateonimmunizations", "hispanic", "incomecat")]),
                                     ntotal_cluster = 16,
                                     ntrt_cluster = 8,
                                     categorical = c("location", "incomecat"),
                                     stratify = "location",
                                     cutoff = 0.1,
                                     seed = 12345)

 # Stratification on income category
 # Two of the income categories contain an odd number of clusters
 # Stratification is not strictly possible

 Design_stratified_inc_result <- cvrall(clustername = Dickinson_design$county,
                                         balancemetric = "l2",
                                         x = data.frame(Dickinson_design[ , c("location", "inciis",
                                             "uptodateonimmunizations", "hispanic", "incomecat")]),
                                         ntotal_cluster = 16,
                                         ntrt_cluster = 8,
                                         categorical = c("location", "incomecat"),
                                         stratify = "incomecat",
                                         cutoff = 0.1,
                                         seed = 12345)

[Package cvcrand version 0.1.1 Index]