counterfactual {Counterfactual}  R Documentation 
Implements the estimation and inference methods for counterfactual analysis described in Chernozhukov, FernandezVal and Melly (2013). Counterfactual
reports point estimates, pointwise confidence bands, and simultaneous confidence bands for functionvalued quantile effects (QE). It also reports pvalues for functional hypotheses such as no effect, constant effect and stochastic dominance. The uniform confidence bands and pvalues are obtained by inverting KolmogorovSmirnov (KS) and CramervonMissesSmirnov (CMS) statistics. The distribution of these statistics is approximated by empirical or weighted bootstrap. We recommend the use of weighted bootstrap when the covariates X include discrete components with small cell sizes.
counterfactual(formula, data, weights, na.action = na.exclude, group, treatment =FALSE, decomposition = FALSE, counterfactual_var, transformation = FALSE, quantiles = c(1:9)/10, method = "qr", trimming = 0.005, nreg = 100, scale_variable, counterfactual_scale_variable, censoring = 0, right = FALSE, nsteps = 3, firstc = 0.1, secondc = 0.05, noboot = FALSE, weightedboot = FALSE, seed = 8, robust = FALSE, reps = 100, alpha = 0.05, first = 0.1, last = 0.9, cons_test = 0, printdeco = TRUE, sepcore = FALSE, ncore = 1)
formula 
a formula object, with the response Y on the left of a ~ operator, and the covariate terms X, separated by + operators, on the right. 
data 
a data.frame in which to interpret the variables named in the formula, or in the weights argument. If this is missing, then the variables in the formula should be on the search list. 
weights 
vector of observation weights. 
na.action 
a function to filter missing data.
The default (with 
quantiles 
quantile indexes of interest for the QE. It should be a vector of values between 0 and 1 with default 
group 
name of a binary variable defining the reference population (value 0) and counterfactual population (value 1). 
treatment 
logical: if 
decomposition 
logical: if 
transformation 
logical: if 
counterfactual_var 
selects the values of X in the counterfactual population (only useful when 
method 
selects the model to be used to estimate the conditional distribution. The following methods have been implemented:

trimming 
value between 0 and 0.5 specifying the amount of trimming to avoid tail estimation in 
nreg 
sets the number of regressions estimated to approximate the conditional distribution; default is 100. 
scale_variable 
selects the components of X that affect the scale in the 
counterfactual_scale_variable 
selects the counterfactual values of the components of X that affect the scale in the 
censoring 
variable specifying the censoring point for each observations (only useful when 
right 
logical: if 
nsteps 
selects the number of steps performed in the 
firstc 
selects the percentage of observations thrown out during the second step in the 
secondc 
selects the percentage of observations thrown out during the third and further steps of the 
noboot 
logical: if 
weightedboot 
logical: if 
seed 
sets the seed for the random number generation (only useful when 
robust 
logical: if 
reps 
number of bootstrap replications; default is 100 (only useful when 
alpha 
a real number between 0 and 1 reflecting the desired significance level for the confidence bands and hypotheses tests (only useful when 
first 
sets the lowest quantile that is used for functional inference; default is 0.1 (only useful when 
last 
sets the highes quantile that is used for functional inference; default is 0.9 (only useful when 
cons_test 
adds tests of the null hypothesis that the QEs = 
printdeco 
logical: if 
sepcore 
logical: if 
ncore 
number of cores used for parallel computing (only useful when 
The populations to construct the observed and counterfactual distributions can be specified in two alternative ways. If the option group
is specified and treatment=FALSE
, then the observed distribution is estimated from the conditional and covariate distributions of group=0
, and the counterfactual distribution is estimated from the conditional distribution of group=0
and the covariate distribution of group=1
. If group
is specified and treatment=TRUE
, then the observed distribution is estimated from the conditional and covariate distributions of group=1
, and the counterfactual distribution is estimated from the conditional distribution of group=0
and the covariate distribution of group=1
. If group
is specified, treatment=TRUE
and decomposition=TRUE
, then all the previous observed and counterfactual distributions are estimated. Alternatively, the option counterfactual_var
can be specified. In this case, the variables specified in the right hand side of formula
contain the covariate values used to estimate the observed distribution and the variables specified in counterfactual_var
contain the covariate values to estimate the counterfactual distribution. Note that counterfactual_var
must contain exactly the same number of variables as in the right hand side of formula
and that the order matters. In addition, if counterfactual_var
is a deterministic transformation of the covariates in the reference population, then transformation
should be set to TRUE
.
method
:
qr
is the default, selects the method based on the linear quantile regression estimator of Koenker and Bassett (1978).
loc
selects the linear location shift method.
locsca
selects the linear locationscale shift method. The logarithm of the variance of the residuals is assumed to be a linear function of the variables given in scale_variable
.
cqr
selects the method based on the censored linear quantile regression estimator of Chernozhukov and Hong (2002). The variable with the censoring values for each observation must be specified in censoring
. By default, this estimator is a threesteps estimator. The number of steps can be increased by the option nsteps
.
cox
selects the methob based on the proportional hazard or duration regression estimator of Cox (1972).
logit
selects the method based on the distribution regression estimator of Chernozhukov, FernandezVal and Melly (2013) with logit link function.
probit
selects the method based on the distribution regression estimator of Chernozhukov, FernandezVal and Melly (2013) with probit link function.
lpm
selects the method based on the distribution regression estimator of Chernozhukov, FernandezVal and Melly (2013) with linear link function.
We refer the user to Chen, Chernozhukov, FernandezVal and Melly (2016) for a more detailed description of the methods.
Return a list of results
quantiles 
quantile indexes of interest for the QE. 
structure_effect 
a vector with the estimated structure effects at the quantile indexes specified with 
composition_effect 
a vector with the estimated composition effects at the quantile indexes specified with 
total_effect 
a vector with the estimated total effects at the quantile indexes specified with 
sample_quantile_ref0 
a matrix with 4 columns. The columns contain the point estimates, standard errors, uniform lower end of confidence band, and uniform upper end of confidence band for the quantiles of Y in the observed distribution estimated using sample quantiles at the quantile indexes specified with 
model_quantile_ref0 
a matrix with 4 columns. The columns contain the point estimates, standard errors, uniform lower end of confidence band, and uniform upper end of confidence band for the quantiles of Y in the observed distribution estimated using the conditional model at the quantile indexes specified with 
model_quantile_counter 
a matrix with 4 columns. The columns contain the point estimates, standard errors, uniform lower end of confidence band, and uniform upper end of confidence band for the quantiles of Y in the counterfactual distribution estimated using the conditional model at the quantile indexes specified with 
sample_quantile_ref1 
a matrix with 4 columns. The columns contain the point estimates, standard errors, uniform lower end of confidence band, and uniform upper end of confidence band for the quantiles of Y in the observed distribution of the population defined by $ 
model_quantile_ref1 
a matrix with 4 columns. The columns contain the point estimates, standard errors, uniform lower end of confidence band, and uniform upper end of confidence band for the quantiles of Y in the observed distribution of the population defined by $ 
nreg 
number of regressions estimated to approximate the conditional distribution. 
resSE 
a matrix with 6 columns. The columns contain the point estimates, standard errors, pointwise lower end of confidence band, pointwise upper end of confidence band, uniform lower end of confidence band, and uniform upper end of confidence band for the structure or treatment quantile effect at the quantile indexes specified with 
testSE 
a matrix with 2 columns including the pvalues based on the KS and CMS statistics for several functional hypotheses on the structure or treatment effect. The first row tests the nullhypothesis of correct specification of the conditional model. The second row tests the null
hypothesis that the change in the distribution of the covariates has no effect. The following rows tests the null hypotheses of constant QE, positive QE, and negative QE. An additional row testing the null hypotheses of constant QE (but at a different level than 0) is added if the option 
resCE 
a matrix with 6 columns. The columns contain the point estimates, standard errors, pointwise lower end of confidence band, pointwise upper end of confidence band, uniform lower end of confidence band, and uniform upper end of confidence band for the composition quantile effect at the quantile indexes specified with 
testCE 
a matrix with 2 columns including the pvalues based on the KS and CMS statistics for several functional hypotheses on the composition effect. The first row tests the nullhypothesis of correct specification of the conditional model. The second row tests the null
hypothesis that the change in the distribution of the covariates has no effect. The following rows tests the null hypotheses of constant QE, positive QE, and negative QE. An additional row testing the null hypotheses of constant QE (but at a different level than 0) is added if the option 
resTE 
a matrix with 6 columns. The columns contain the point estimates, standard errors, pointwise lower end of confidence band, pointwise upper end of confidence band, uniform lower end of confidence band, and uniform upper end of confidence band for the total quantile effect at the quantile indexes specified with 
testTE 
a matrix with 2 columns including the pvalues based on the KS and CMS statistics for several functional hypotheses on the total effect. The first row tests the nullhypothesis of correct specification of the conditional model. The second row tests the null
hypothesis that the change in the distribution of the covariates has no effect. The following rows tests the null hypotheses of constant QE, positive QE, and negative QE. An additional row testing the null hypotheses of constant QE (but at a different level than 0) is added if the option 
Mingli Chen, Victor Chernozhukov, Ivan FernandezVal, Blaise Melly
Chen, M., Chernozhukov, V., I. FernandezVal, and B. Melly (2016). Counterfactual Analysis in R: A Vignette.
Chernozhukov, V., I. FernandezVal, and B. Melly (2013). Inference on Counterfactual Distributions. Econometrica 81(6), 22052268.
Chernozhukov, V., and H. Hong (2002). Threestep Censored Quantile Regression and Extramarital Affairs.Journal of the American Statistical Association, 97, 872881.
Cox, D. R. (1972). Regression Models and Life Tables. Journal of the Royal Statistical Society, Ser. B, 34, 187220.
Koenker, R., and G. Bassett (1978). Regression Quantiles. Econometrica, 46(1), 3350.
#Counterfactual distribution of X constructed by transformation of reference distribution ## Not run: data(engel) attach(engel) counter_income < mean(income)+0.75*(incomemean(income)) rqres < counterfactual(foodexp~income, counterfactual_var=counter_income, nreg=100, transformation=TRUE, sepcore = TRUE, ncore=2) ## End(Not run) # Wage decomposition: counterfactual and reference populations correspond to different groups data(nlsw88) attach(nlsw88) lwage < log(wage) # method: logit logitres<counterfactual(lwage~tenure+ttl_exp+grade, group=union, treatment=TRUE, decomposition=TRUE, method="logit", noboot=TRUE, sepcore = TRUE,ncore=2)