R: Cross validation for the CausalANOVA.

cv.CausalANOVA {FindIt}

R Documentation

Cross validation for the CausalANOVA.

Description

cv.CausalANOVA implements cross-validation for CausalANOVA to select the collapse.cost parameter. CausalANOVA runs this function internally when defaults when collapse.type=cv.min or collapse.type=cv.1Std.

Usage

cv.CausalANOVA(
  formula,
  int2.formula = NULL,
  int3.formula = NULL,
  data,
  nway = 1,
  pair.id = NULL,
  diff = FALSE,
  cv.collapse.cost = c(0.1, 0.3, 0.7),
  nfolds = 5,
  screen = FALSE,
  screen.type = "fixed",
  screen.num.int = 3,
  family = "binomial",
  cluster = NULL,
  maxIter = 50,
  eps = 1e-05,
  seed = 1234,
  fac.level = NULL,
  ord.fac = NULL,
  verbose = TRUE
)

Arguments

`formula`	a formula that specifies outcome and treatment variables.
`int2.formula`	(optional). A formula that specifies two-way interactions.
`int3.formula`	(optional). A formula that specifies three-way interactions.
`data`	an optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If not found in 'data', the variables are taken from 'environment(formula)', typically the environment from which 'CausalANOVA' is called.
`nway`	With `nway=1`, the function estimates the Average Marginal Effects (AMEs) only. With `nway=2`, the function estimates the AMEs and the two-way Average Marginal Interaction Effects (AMIEs). With `nway=3`, the function estimates the AMEs, the two-way and three-way AMIEs. Default is 1.
`pair.id`	(optional).Unique identifiers for each pair of comparison. This option is used when `diff=TRUE`.
`diff`	A logical indicating whether the outcome is the choice between a pair. If `diff=TRUE`, `pair.id` should specify a pair of comparison. Default is `FALSE`.
`cv.collapse.cost`	A vector containing candidates for a cost parameter ranging from 0 to 1. 1 corresponds to no regularization and the smaller value corresponds to the stronger regularization. Default is `c(0.1,0.3,0.7)`.
`nfolds`	number of folds - default is 5. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets.
`screen`	A logical indicating whether select significant factor interactions with `glinternet`. When users specify interactions using `int2.formula` or `int3.formula`, this option is ignored. `screen` should be used only when users want data-driven selection of factor-interactions. With `screen.type`, users can specify how to screen factor interactions. We recommend to use this option when the number of factors is large, e.g., more than 6. Default is `FALSE`.
`screen.type`	Type for screening factor interactions. (1) `"fixed"` select the fixed number (specified by `screen.num.int`) of factor interactions. (2) `"cv.min"` selects factor-interactions with the tuning parameter giving the minimum cross-validation error. (3) `"cv.1Std"` selects factor-interactions with the tuning parameter giving a cross-validation error that is within 1 standard deviation of the minimum cv error.
`screen.num.int`	(optional).The number of factor interactions to select. This option is used when and `screen=TRUE` and `screen.type="fixed"`. Default is 3.
`family`	A family of outcome variables. `"gaussian"` when continuous outcomes `"binomial"` when binary outcomes. Default is `"binomial"`.
`cluster`	Unique identifies with which cluster standard errors are computed.
`maxIter`	The number of maximum iteration for `glinternet`.
`eps`	A tolerance parameter in the internal optimization algorithm.
`seed`	an argument for `set.seed()`.
`fac.level`	optional. A vector containing the number of levels in each factor. The order of `fac.level` should match to the order of columns in the data. For example, when the first and second columns of the design matrix is "Education" and "Race", the first and second element of `fac.level` should be the number of levels in "Education" and "Race", respectively.
`ord.fac`	optional. logical vectors indicating whether each factor has ordered (`TRUE`) or unordered (`FALSE`) levels. When levels are ordered, the function uses the order given by function `levels()`. If levels are ordered, the function places penalties on the differences between adjacent levels. If levels are unordered, the function places penalties on the differences based on every pairwise comparison.
`verbose`	whether it prints the value of a cost parameter used.

Details

See Details in CausalANOVA.

Value

`cv.error`	The mean cross-validated error - a vector of length `length(cv.t)`.
`cv.min`	A value of `t` that gives minimum `cv.missclass`.
`cv.1Std`	The largest value of `t` such that error is within 1 standard error of the minimum.
`cv.each.mat`	A matrix containing cross-validation errors for each fold and cost parameter.
`cv.cost`	The `cv.collapse.cost` used in the function.

Author(s)

Naoki Egami and Kosuke Imai.

References

Post, J. B. and Bondell, H. D. 2013. “Factor selection and structural identification in the interaction anova model.” Biometrics 69, 1, 70–79.

Egami, Naoki and Kosuke Imai. 2019. Causal Interaction in Factorial Experiments: Application to Conjoint Analysis, Journal of the American Statistical Association. http://imai.fas.harvard.edu/research/files/int.pdf

Examples


data(Carlson)
## Specify the order of each factor
Carlson$newRecordF<- factor(Carlson$newRecordF,ordered=TRUE,
                            levels=c("YesLC", "YesDis","YesMP",
                                     "noLC","noDis","noMP","noBusi"))
Carlson$promise <- factor(Carlson$promise,ordered=TRUE,levels=c("jobs","clinic","education"))
Carlson$coeth_voting <- factor(Carlson$coeth_voting,ordered=FALSE,levels=c("0","1"))
Carlson$relevantdegree <- factor(Carlson$relevantdegree,ordered=FALSE,levels=c("0","1"))

## ####################################### 
## Collapsing Without Screening
## ####################################### 
#################### AMEs and two-way AMIEs ####################
## We show a very small example for illustration.
## Recommended to use cv.collapse.cost=c(0.1,0.3,0.5) and nfolds=10 in practice.
fit.cv <- cv.CausalANOVA(formula=won ~ newRecordF + promise + coeth_voting + relevantdegree,
                         int2.formula = ~ newRecordF:coeth_voting,
                         data=Carlson, pair.id=Carlson$contestresp,diff=TRUE,
                         cv.collapse.cost=c(0.1,0.3), nfolds=2,
                         cluster=Carlson$respcodeS, nway=2)
fit.cv

[Package FindIt version 1.2.0 Index]