AME {FLAME} | R Documentation |
Almost Matching Exactly (AME) Algorithms for Discrete, Observational Data
Description
Almost Matching Exactly (AME) Algorithms for Discrete, Observational Data
Usage
FLAME(
data,
holdout = 0.1,
C = 0.1,
treated_column_name = "treated",
outcome_column_name = "outcome",
weights = NULL,
PE_method = "ridge",
user_PE_fit = NULL,
user_PE_fit_params = NULL,
user_PE_predict = NULL,
user_PE_predict_params = NULL,
replace = FALSE,
estimate_CATEs = FALSE,
verbose = 2,
return_pe = FALSE,
return_bf = FALSE,
early_stop_iterations = Inf,
early_stop_epsilon = 0.25,
early_stop_control = 0,
early_stop_treated = 0,
early_stop_pe = Inf,
early_stop_bf = 0,
missing_data = c("none", "drop", "keep", "impute"),
missing_holdout = c("none", "drop", "impute"),
missing_data_imputations = 1,
missing_holdout_imputations = 5,
impute_with_treatment = TRUE,
impute_with_outcome = FALSE
)
DAME(
data,
holdout = 0.1,
treated_column_name = "treated",
outcome_column_name = "outcome",
weights = NULL,
PE_method = "ridge",
n_flame_iters = 0,
user_PE_fit = NULL,
user_PE_fit_params = NULL,
user_PE_predict = NULL,
user_PE_predict_params = NULL,
replace = FALSE,
estimate_CATEs = FALSE,
verbose = 2,
return_pe = FALSE,
return_bf = FALSE,
early_stop_iterations = Inf,
early_stop_epsilon = 0.25,
early_stop_control = 0,
early_stop_treated = 0,
early_stop_pe = Inf,
early_stop_bf = 0,
missing_data = c("none", "drop", "keep", "impute"),
missing_holdout = c("none", "drop", "impute"),
missing_data_imputations = 1,
missing_holdout_imputations = 5,
impute_with_treatment = TRUE,
impute_with_outcome = FALSE
)
## S3 method for class 'ame'
print(x, digits = getOption("digits"), linewidth = 80, ...)
Arguments
data |
Data to be matched. Either a data frame or a path to a .csv file
to be read into a data frame. Treatment must be described by a logical or
binary numeric column with name |
holdout |
Holdout data to be used to compute predictive error, if
|
C |
A finite, positive scalar denoting the tradeoff between BF and PE in the FLAME algorithm. Higher C prioritizes more matches and lower C prioritizes not dropping important covariates. Defaults to 0.1. |
treated_column_name |
Name of the treatment column in |
outcome_column_name |
Name of the outcome column in |
weights |
A positive numeric vector representing covariate importances. Supplying this argument prevents PE from being computed as it determines dropping order by forcing covariate subsets with lower weights to be dropped first. The weight of a covariate subset is defined to be the sum of the weights of the constituent covariates. Ties are broken at random. |
PE_method |
Denotes how predictive error (PE) is to be computed. Either
a string – one of "ridge" (default) or "xgb" – or a function. If "ridge",
ridge regression is used to fit a an outcome regression model via
|
user_PE_fit |
Deprecated; use argument 'PE_method' instead. An optional
function supplied by the user that can be used instead of those allowed for
by |
user_PE_fit_params |
Deprecated; use argument 'PE_method' instead. A
named list of optional parameters to be used by |
user_PE_predict |
Deprecated; use argument 'PE_method' instead. An
optional function supplied by the user that can be used to generate
predictions from the output of |
user_PE_predict_params |
Deprecated; use argument 'PE_method' instead. A
named list of optional parameters to be used by |
replace |
A logical scalar. If |
estimate_CATEs |
A logical scalar. If |
verbose |
Controls how FLAME displays progress while running. If 0, no output. If 1, only outputs the stopping condition. If 2, outputs the iteration and number of unmatched units every 5 iterations, and the stopping condition. If 3, outputs the iteration and number of unmatched units every iteration, and the stopping condition. Defaults to 2. |
return_pe |
A logical scalar. If |
return_bf |
A logical scalar. If |
early_stop_iterations |
A positive integer, denoting an upper bound
on the number of matching rounds to be performed. If 1, one round of
exact matching is performed before stopping. Defaults to |
early_stop_epsilon |
A nonnegative numeric. If fixed covariate weights
are passed via |
early_stop_control , early_stop_treated |
If the proportion of control, treated units, respectively, that are unmatched falls below this value, the matching algorithm will stop. Default to 0. |
early_stop_pe |
Deprecated. A positive numeric. If FLAME attempts to
drop a covariate that would lead to a PE above this value, FLAME stops.
Defaults to |
early_stop_bf |
Deprecated. A numeric value between 0 and 2. If FLAME attempts to drop a covariate that would lead to a BF below this value, FLAME stops. Defaults to 0. |
missing_data |
Specifies how to handle missingness in |
missing_holdout |
Specifies how to handle missingness in |
missing_data_imputations |
Defunct. If |
missing_holdout_imputations |
If |
impute_with_treatment , impute_with_outcome |
If |
n_flame_iters |
Specifies that this many iterations of FLAME should be run before switching to DAME. This can be used to speed up the matching procedure as FLAME rapidly eliminates irrelevant covariates, after which DAME will make higher quality matches on the remaining variables. |
x |
An object of class |
digits |
Number of significant digits for printing the average treatment effect. |
linewidth |
Maximum number of characters on line; output will be wrapped accordingly. |
... |
Additional arguments to be passed to other methods. |
Value
An object of type ame
, which by default is a list of 4
entries:
- data
The original data frame with several modifications:
An extra logical column,
data$matched
, that indicates whether or not a unit was matched.An extra numeric column,
data$weight
, that denotes on how many different sets of covariates a unit was matched. This will only be greater than 1 whenreplace = TRUE
.The columns denoting treatment and outcome will be moved after all covariate columns.
If
replace
isFALSE
, a column containing a matched group identifier for each unit.If,
estimate_CATEs = TRUE
, a column containing the CATE estimate for each unit.
- MGs
A list whose
'th entry contains the indices of units in the main matched group of the
'th unit.
- cov_sets
A list whose
'th entry contains the covariates set not matched on in the
'th iteration.
- info
A list containing miscellaneous information about the data and matching specifications. Primarily for use by
*.ame
methods.
Introduction
FLAME and DAME are matching algorithms for
observational causal inference on data with discrete (categorical)
covariates. They match units that share identical values of certain
covariates, as follows. The algorithms first make any possible exact
matches; that is, they match units that share identical values of all
covariates (this is possible because covariates are discrete). They then
iteratively drop a set of covariates and make any possible matches on the
remaining covariates, until stopping. For each unit, DAME solves an
optimization problem that finds the highest quality set of covariates the
unit can be matched to others on, where quality is determined by how well
that set of covariates predicts the outcome. FLAME approximates the
solution to the problem solved by DAME; at each step, it drops the
covariate leading to the smallest drop in match quality , defined
as
. Here,
denotes the predictive error,
which measures how important the dropped covariate is for predicting the
outcome. The balancing factor
measures the number of matches
formed by dropping that covariate. In this way, FLAME encourages matching
on covariates more important to the outcome and also making many matches.
The hyperparameter
controls the balance between these two
objectives. In both cases, a machine learning algorithm trained on a
holdout dataset is responsible for learning the quality / importance of
covariates. For more details on the algorithms, please see the vignette,
the FLAME paper here and/or
the DAME paper here.
Stopping Rules
By default, both FLAME
and DAME
stop
when 1. all covariates have been dropped or 2. all treatment or control
units have been matched. This behavior can be modified by the arguments
whose prefix is "early_stop". With the exception of
early_stop_iterations
, all the rules come into play before
the offending covariate set is dropped. For example, if
early_stop_control = 0.2
and at the current iteration, dropping the
covariate leading to highest match quality is associated with a unmatched
control proportion of 0.1, FLAME will stop without dropping this
covariate.
Missing Data
FLAME
and DAME
offer functionality for
handling missing data in the covariates, for both the data
and
holdout
sets. This functionality can be specified via the arguments
whose prefix is "missing" or "impute". It allows for ignoring missing data,
imputing it, or (for data
) not matching on missing values. If
data
is imputed, imputation will be done once and the matching
algorithm will be run on the imputed dataset. If holdout
is imputed,
the predictive error at an iteration will be the average of predictive
errors across all imputed holdout
datasets. Units with missingness
in the treatment or outcome will be dropped.
Examples
## Not run:
data <- gen_data()
holdout <- gen_data()
# FLAME with replacement, stopping after dropping a single covariate
FLAME_out <- FLAME(data = data, holdout = holdout,
replace = TRUE, early_stop_iterations = 2)
# Use a linear model to compute predictive error. Call DAME without
# replacement, returning predictive error at each iteration.
my_PE <- function(X, Y) {
return(lm(Y ~ ., as.data.frame(cbind(X, Y = Y)))$fitted.values)
}
DAME_out <- DAME(data = data, holdout = holdout,
PE_method = my_PE, return_PE = TRUE)
## End(Not run)