ca {SortedEffects} | R Documentation |
Empirical Classification Analysis (CA) and Inference
Description
ca
conducts CA estimation and inference on user-specified objects of
interest: first (weighted) moment or (weighted) distribution. Users can use
t
to specify variables in interest. When object of interest is
moment, use cl
to specify whether want to see averages or difference
of the two groups.
Usage
ca(
fm,
data,
method = c("ols", "logit", "probit", "QR"),
var_type = c("binary", "continuous", "categorical"),
var,
compare,
subgroup = NULL,
samp_weight = NULL,
taus = c(5:95)/100,
u = 0.1,
interest = c("moment", "dist"),
t = c(1, 1, rep(0, dim(data)[2] - 2)),
cl = c("both", "diff"),
cat = NULL,
alpha = 0.1,
b = 500,
parallel = FALSE,
ncores = detectCores(),
seed = 1,
bc = TRUE,
range_cb = c(1:99)/100,
boot_type = c("nonpar", "weighted")
)
Arguments
fm |
Regression formula |
data |
The data in use: full sample or subpopulation in interset |
method |
Models to be used for estimating partial effects. Four
options: |
var_type |
The type of parameter in interest. Three options:
|
var |
Variable T in interset. Should be a character. |
compare |
If parameter in interest is categorical, then user needs
to specify which two category to compare with. Should be
a 1 by 2 character vector. For example, if the two levels
to compare with is 1 and 3, then |
subgroup |
Subgroup in interest. Default is |
samp_weight |
Sampling weight of data. Input should be a n by 1 vector,
where n denotes sample size. Default is |
taus |
Indexes for quantile regression. Default is
|
u |
Percentile of most and least affected. Default is set to be 0.1. |
interest |
Generic objects in the least and most affected
subpopulations. Two options:
(1) |
t |
An index for ca object. Should be a 1 by ncol(data)
indicator vector. Users can either (1) specify names of
variables of interest directly, or (2) use 1 to indicate
the variable of interest. For example, total number of
variables is 5 and interested in the 1st and 3rd vars,
then specify |
cl |
If |
cat |
P-values in classification analysis are adjusted for
multiplicity to account for joint testing of zero
coefficients on for all variables within a category.
Suppose we have selected specified 3 variables in
interest: |
alpha |
Size for confidence interval. Shoule be between 0 and 1. Default is 0.1 |
b |
Number of bootstrap draws. Default is 500. |
parallel |
Whether the user wants to use parallel computation.
The default is |
ncores |
Number of cores for computation. Default is set to be
|
seed |
Pseudo-number generation for reproduction. Default is 1. |
bc |
Whether want the estimate to be bias-corrected. Default
is |
range_cb |
When |
boot_type |
Type of bootstrap. Default is |
Details
All estimates are bias-corrected and all confidence bands are monotonized. The bootstrap procedures follow algorithm 2.2 as in Chernozhukov, Fernandez-Val and Luo (2018).
Value
If subgroup = NULL
, all outputs are whole sample. Otherwise output
are subgroup results. When interest = "moment"
, the output is a list
showing
-
est
Estimates of variables in interest. -
bse
Bootstrap standard errors. -
joint_p
P-values that are adjusted for multiplicity to account for joint testing for all variables. -
pointwise_p
P-values that doesn't adjust for join testing
If users have further specified cat
(e.g., !is.null(cat)
), the
fourth component will be replaced with p_cat
: P-values that are a
djusted for multiplicity to account for joint testing for all variables
within a category. Users can use summary.ca
to tabulate the
results.
When interest = "dist"
, the output is a list of two components:
-
infresults
A list that stores estimates, upper and lower confidence bounds for all variables in interest for least and most affected groups. -
sortvar
A list that stores sorted and unique variables in interest.
We recommend using plot.ca
command for result visualization.
Examples
data("mortgage")
### Regression Specification
fm <- deny ~ black + p_irat + hse_inc + ccred + mcred + pubrec +
ltv_med + ltv_high + denpmi + selfemp + single + hischl
### Specify characteristics of interest
t <- c("deny", "p_irat", "black", "hse_inc", "ccred", "mcred", "pubrec",
"denpmi", "selfemp", "single", "hischl", "ltv_med", "ltv_high")
### issue ca command
CA <- ca(fm = fm, data = mortgage, var = "black", method = "logit",
cl = "diff", t = t, b = 50, bc = TRUE)