R: Sequential assignment of unit(s) into experimental conditions...

seqblock {blockTools}

R Documentation

Sequential assignment of unit(s) into experimental conditions using covariates

Description

Sequentially assign units into experimental conditions. Blocking begins by creating a measure of multivariate distance between a current unit and one or multiple prior, already-assigned unit(s). Then, average distance between current unit and each treatment condition is calculated, and random assignment is biased toward conditions more dissimilar to current unit. Argument values can be specified either as argument to the function, or via a query. The query directly asks the user to identify the blocking variables and to input, one-by-one, each unit's variable values.

Usage

seqblock(object = NULL, id.vars, id.vals, exact.vars = NULL, exact.vals = NULL, 
  exact.restr = NULL, exact.alg = "single", covar.vars = NULL, covar.vals = NULL, 
  covar.restr = NULL, covars.ord = NULL, n.tr = 2, tr.names = NULL, assg.prob = NULL,
  seed = NULL, seed.dist, assg.prob.stat = NULL, trim = NULL, assg.prob.method = NULL,
  assg.prob.kfac = NULL, distance = NULL, file.name = NULL, query = FALSE, 
  verbose = TRUE, ...)

Arguments

`object`	a character string giving the file name of a `.RData` file containing a list output from the `seqblock` function which contains at least one previously assigned unit.
`id.vars`	a string or vector of strings specifying the name of the identifying variable(s); if `query = FALSE` and the object argument is not given, then the `id.vars` argument is required.
`id.vals`	a vector of ID values for every unit being assigned to a treatment group; those are corresponding to the `id.vars`.
`exact.vars`	a string or vector of strings containing the names of each of the exact blocking variables.
`exact.vals`	a vector containing the unit's values on each of the exact blocking variables.
`exact.restr`	a list object containing the restricted values that the exact blocking variables can take on. Thus the first element of `exact.restr` is a vector containing all of the possible values that the first exact blocking variable (see `exact.vars` above) can take on; the second element is a vector containing all of the possible values for the second exact blocking variable; and so on.
`exact.alg`	a string specifying the blocking algorithm. Currently the only acceptable value is `"single"`. This algorithm creates a variable with a unique level for every possible combination of the values in all of the exact variables. See Details section below.
`covar.vars`	a string or vector of strings containing the names of each of the non-exact blocking variables.
`covar.vals`	a vector containing the unit's values on each of the non-exact blocking variables.
`covar.restr`	a list object containing the restricted values that the non-exact blocking variables can take on. Thus the first element of `covar.restr` is a vector containing all of the possible values that the first non-exact blocking variable (see `covar.vars` above) can take on; the second element is a vector containing all of the possible values for the second non-exact blocking variable; and so on.
`covars.ord`	a string or vector of strings containing the name of the non-exact blocking variables ordered so that the highest priority covariate comes first, followed by the second highest priority covariate, then the third, etc.
`n.tr`	the number of treatment groups. If not specified, this defaults to `n.tr = 2`.
`tr.names`	a string or vector of strings containing the names of the different treatment groups.
`assg.prob`	a numeric vector containing the probabilities that a unit will be assigned to the treatment groups; this vector should sum to 1.
`seed`	an optional integer value for the random seed, which is used when assigning units to treatment groups.
`seed.dist`	an optional integer value for the random seed set in `cov.rob`, used to calculate measures of the variance-covariance matrix robust to outliers.
`assg.prob.stat`	a string specifying which assignment probability summary statistic to use; valid values are `mean`, `median`, and `trimmean`. If not specified, this defaults to `assg.prob.stat = "mean"`.
`trim`	a numeric value specifying what proportion of the observations are to be dropped from each tail when the assignment probability summary statistic (`assg.prob.stat`) is set equal to `trimmean`. Blocks on each tail of the distribution are dropped before the mean is calculated. If not specified, this defaults to `trim = 0.1`.
`assg.prob.method`	a string specifying which algorithm should be used when assigning treatment probabilities. Acceptable values are `ktimes`, `fixed`, `prop`, `prop2`, and `wprop`. If not specified, this defaults to `assg.prob.method = "ktimes"`.
`assg.prob.kfac`	a numeric value for `k`, the factor by which the most likely experimental condition will be multiplied relative to the other conditions. If not specified, this defaults to `assg.prob.kfac = 2`.
`distance`	a string specifying how the multivariate distance used for blocking covariates are calculated. If not specified, this defaults to `distance = "mahalanobis"`.
`file.name`	a string containing the name of the file that one would like the output to be written to. Ideally this file name should have the extension .RData.
`query`	a logical stating whether the console should ask the user questions to input the data and assign a treatment condition. If not specified, this defaults to `query = FALSE`.
`verbose`	a logical stating whether the function should print the name of the output file, the current working directory, the treatment group that the most recent unit was assigned to, and the dataframe `x` returned by the function as part of the `bdata` list. If not specified, this defaults to `verbose = TRUE`.
`...`	additional arguments.

Details

The seqblock function's code is primarily divided into two parts: the first half deals with instances, in which the unit being assigned is the first unit in a given study to receive an assignment; the second half addresses subsequent units that are assigned after at least one first assignment has already been made. If the object argument is left as NULL, the function will run the first half; if the object argument is specified, the second part will be executed. When object = NULL, the researcher has no past file from which to pull variable names and past data; this corresponds to the case when the unit being assigned is the first one. If the researcher does specify object, it implies the user is drawing data from a past file, which means this is not the first unit in the study to be assigned to a treatment.

However, the function can be called for subsequent units even when object is not specified. By setting query = TRUE, the console will ask the researcher whether this is the first unit to be assigned in the study. Based on the researcher's response, it will decide which part of the code to run.

If the object and file.name arguments are set to the same value, then seqblock overwrites the specified file with a new file, which now contains both the previously-assigned units and the newly-assigned unit. To create a new file when a new unit is assigned, use a new file.name.

The single algorithm (see exact.alg in the Arguments section above) creates a variable that has a unique level for every possible combination of the exact variables. As an example, say that there were 3 exact blocking variables: party (Democrat, Republican); region (North, South); and education (HS, NHS). The single algorithm creates one level for units with the following values: Democrat-North-HS. It would create another level for Democrat-North-NHS; a third level for Republican-North-HS; and so forth, until every possible combination of these 3 variables has its own level. Thus if there are k exact blocking variables and each exact blocking variable has q_{i} values it can take on, then there are a total of \prod_{1}^{k} q_{i} levels created.

The distance = "mcd" and distance = "mve" options call cov.rob to calculate measures of multivariate spread robust to outliers. The distance = "mcd" option calculates the Minimum Covariance Determinant estimate (Rousseeuw 1985); the distance = "mve" option calculates the Minimum Volume Ellipsoid estimate (Rousseeuw and van Zomeren 1990). When distance = "mcd", the interquartile range on blocking variables should not be zero. The distance = "euclidean" option calculates the Euclidean distance between the new unit and the previously-assigned units. The default distance = "mahalanobis" option calculates the Mahalanobis distance.

Value

A list (called bdata) with elements

`x`	a dataframe containing the names and values for the different ID and blocking variables, as well as each unit's initial treatment assignment.
`nid`	a string or vector of strings containing the name(s) of the ID variable(s).
`nex`	a string or vector of strings containing the name(s) of the exact blocking variable(s).
`ncv`	a string or vector of strings containing the name(s) of the non-exact blocking variable(s).
`rex`	a list of the restricted values of the exact blocking variables.
`rcv`	a list of the restricted values of the non-exact blocking variables.
`ocv`	a vector of the order of the non-exact blocking variables.
`trn`	a string or vector of strings containing the name(s) of the different treatment groups.
`apstat`	a string specifying the assignment probability summary statistic that was used.
`mtrim`	a numeric value specifying the proportion of observations to be dropped when the assignment probability statistic takes on the value `"trimmean"`.
`apmeth`	a string specifying the assignment probability algorithm that was used.
`kfac`	the assignment probability kfactor; see assg.prob.kfac in the Arguments section above.
`assgpr`	a vector of assignment probabilities to each treatment group.
`distance`	a string specifying how the multivarite distance used for blocking was calculated.
`trd`	a list with the length equal to the number of previously assigned treatment conditions; each object in the list contains a vector of the distance between each unit in one treatment group and the new unit. This will be `NULL` when there are no non-exact blocking variables.
`tr.sort`	a string vector of treatment conditions, sorted from the largest to the smallest. Set to `NULL` when there are no non-exact blocking variables.
`p`	a vector of assignment probabilities to each treatment group used in assigning a treatment condition to the new unit.
`distance`	a string specifying how the multivarite distance used for blocking is calculated
`trcount`	a table containing the counts for each experimental/treatment conditions.
`datetime`	the date and time at which each unit was assigned their treatment group.
`orig`	a dataframe containing the names and values for the different id and blocking variables, as well as each unit's treatment assignment.

Author(s)

Ryan T. Moore rtm@wustl.edu, Tommy Carroll tcarroll22@wustl.edu, Jonathan Homola homola@wustl.edu and Jeong Hyun Kim jeonghyun.kim@wustl.edu

References

Moore, Ryan T. and Sally A. Moore. 2013. "Blocking for Sequential Political Experiments." Political Analysis 21(4):507-523.

Moore, Ryan T. 2012. "Multivariate Continuous Blocking to Improve Political Science Experiments." Political Analysis 20(4):460-479.

Rousseeuw, Peter J. 1985. "Multivariate Estimation with High Breakdown Point". Mathematical Statistics and Applications 8:283-297.

Rousseeuw, Peter J. and Bert C. van Zomeren. 1990. "Unmasking Multivariate Outliers and Leverage Points". Journal of the American Statistical Association 85(411):633-639.

Examples

## Assign first unit (assume a 25 year old member of the Republican Party) to a treatment group.
## Save the results in file "sdata.RData":
## seqblock(query = FALSE, id.vars = "ID", id.vals = 001, exact.vars = "party", 
##   exact.vals = "Republican", covar.vars = "age", covar.vals = 25, file.name = "sdata.RData")

## Assign next unit (age 30, Democratic Party):
## seqblock(query = FALSE, object = "sdata.RData", id.vals = 002, exact.vals = "Democrat", 
##   covar.vars = "age", covar.vals = 30, file.name = "sdata.RData")

[Package blockTools version 0.6.4 Index]