ob_decompose {ddecompose} | R Documentation |
Oaxaca-Blinder decomposition
Description
ob_decompose
implements the Oaxaca-Blinder decomposition that
divides differences in the mean outcome between two groups into one part explained
by different covariate means (composition effect) and into another part due to
differences in linear regression coefficients linking covariates to the outcome
variable (structure effect).
The function allows for 'doubly robust' decompositions where the sample of one group is reweighted such that it matches the covariates distribution of the other group before the regression coefficients are estimated.
For distributional statistics beyond the mean, the function performs the RIF regression decomposition proposed by Firpo, Fortin, and Lemieux (2018).
Usage
ob_decompose(
formula,
data,
group,
weights = NULL,
reweighting = FALSE,
normalize_factors = FALSE,
reference_0 = TRUE,
subtract_1_from_0 = FALSE,
reweighting_method = "logit",
trimming = FALSE,
trimming_threshold = NULL,
rifreg_statistic = NULL,
rifreg_probs = c(1:9)/10,
custom_rif_function = NULL,
na.action = na.omit,
bootstrap = FALSE,
bootstrap_iterations = 100,
bootstrap_robust = FALSE,
cluster = NULL,
cores = 1,
vcov = stats::vcov,
...
)
Arguments
formula |
a |
data |
a data frame containing the variables in the model. |
group |
name of the a binary variable (numeric or factor) identifying the two groups that will be compared. The group identified by the lower ranked value in 'group' (i.e., 0 in the case of a dummy variable or the first level of factor variable) is defined as group 0. |
weights |
numeric vector of non-negative observation weights, hence of same length as |
reweighting |
boolean: if 'TRUE', then the decomposition is performed with with respect to reweighted reference group yielding either a 'doubly robust' Oaxaca-Blinder decomposition or a reweighted RIF decomposition. |
normalize_factors |
boolean: If 'TRUE', then factor variables are normalized as
proposed by Gardeazabal/Ugidos (2004) and results are not dependent on the factor's
reference group. Per default ( |
reference_0 |
boolean: if 'TRUE' (default), then the group 0 – i.e.,
the group identified by the lower ranked value in 'group' – will be defined
as reference group. The reference group will be reweighted to match the
covariates distribution of the counterfactual sample.
By default, the composition effect is computed as |
subtract_1_from_0 |
boolean: By default ('FALSE'), X0 is subtracted from X1 and beta0 from beta1 (X1b1 - X0b0)
to compute the overall difference. Setting 'subtract_1_from_0' to 'TRUE' merely changes the sign of the decomposition results.
This means the composition effect is computed as |
reweighting_method |
specifies the method fit and predict conditional probabilities
used to derive the reweighting factor. Currently, |
trimming |
boolean: If |
trimming_threshold |
numeric: threshold defining the maximal accepted
relative weight of the reweighting factor value (i.e., inverse probability weight)
of a single observation. If |
rifreg_statistic |
string containing the distributional statistic for which to compute the RIF.
If 'NULL' (default), no RIF regression decomposition is computed.
If an available statistic is selected, 'ob_decompose' estimates a RIF regression decomposition.
The 'rifreg_statistic' can be one of
"quantiles", "mean", "variance", "gini", "interquantile_range", "interquantile_ratio", or "custom".
If "custom" is selected, a |
rifreg_probs |
a vector of length 1 or more with probabilities of quantiles. Each quantile is indicated with a value between 0 and 1.
Default is |
custom_rif_function |
the RIF function to compute the RIF of the custom distributional statistic.
Default is NULL. Only needs to be provided if |
na.action |
generic function that defines how NAs in the data should be handled.
Default is |
bootstrap |
boolean: If 'FALSE' (default), then no bootstrapped standard errors are calculated and, in the case of a standard Oaxaca-Blinder decomposition, analytical standard errors are estimated (assuming independence between groups). |
bootstrap_iterations |
positive integer indicating the number of bootstrap
iterations to execute. Only required if |
bootstrap_robust |
boolean: if 'FALSE' (default), then bootstrapped standard errors are estimated as the standard deviations of the bootstrapp estimates. Otherwise, the function uses the bootstrap interquartile range rescaled by the interquantile range of the standard distribution to estimate standard errors. |
cluster |
numeric vector of same length as |
cores |
positive integer indicating the number of cores to use when
computing bootstrap standard errors. Only required if |
vcov |
function estimating covariance matrix of regression coefficients if
standard errors are not bootstrapped (i.e., |
... |
additional parameters passed to the custom_rif_function. Apart from dep_var, weights and probs they must have a different name than the the ones in rifreg. For instance, if you want to pass a parameter statistic to the custom_rif_function, name it custom_statistic. Additional parameters can also be passed to the density function used to estimate the RIF of quantiles. |
Details
ob_decompose()
contains for four different decomposition methods of
observed group differences.
1. The original Oaxaca-Blinder decomposition (default)
2. A 'doubly robust Oaxaca-Blinder decomposition (reweighting=TRUE
)
3. A RIF Regression decomposition. (e.g., rifreg_statistic="quantiles"
)
4. A reweighted RIF regression decomposition. (reweighting=TRUE
and rifreg_statistic="quantiles"
)
The doubly robust OB decomposition is a robust and path independent alternative for detailed decompositions at the mean. is to combine reweighting with the linear Oaxaca-Blinder method (see Fortin et al., 2011: 48-51). This approach has the valuable side effect of accounting for potential errors introduced by an incomplete inverse probability weighting and the linear model specification, respectively.
A path independent method that goes beyond the mean is the RIF decomposition
of Firpo, Fortin, and Lemieux (2018). The approach approximates the expected value
of the 'recentered influence function' (RIF) of the distributional statistic
(e.g., quantile, variance, or Gini coefficient) of an outcome variable
conditional on covariates with linear regressions. RIF regression coefficients can
be consistent estimates of the marginal effect
of a small change in the expected value of a covariate to the distributional statistics of
an outcome variable (see documentation of the companion package rifreg
).
Thus, they can be used to decompose between-group difference in distributional statistics.
Firpo et al. (2018) combine the RIF regressions again with the reweighting estimator to avoid specification errors.
Value
an object of class ob_decompose
containing a data.frame with the
decomposition results for the quantiles and for the other distributional
statistics, respectively, a data.frame with the estimated reweighting factor
for every observation, a data.frame with sample quantiles of the reweighting
factors and a list with standard errors for the decomposition terms, the
quantiles of the reweighting factor, the bootstrapped
Kolmogorov-Smirnov distribution to construct uniform confidence bands for
quantiles, as well as a list with the normalized differences between the
covariate means of the comparison group and the reweighted reference group.
A list object of class 'ob_decompose' containing the following components:
- 'ob_decompose': A list containing the decomposition results, covariance matrix, model fits and more detailed result information.
- 'group_variable_name': A string indicating the name of the group variable.
- 'group_variable_levels': A string indicating the levels of the group variable.
- 'reference_group': A string indicating the which level of the group variable was used as reference group.
- 'reweighting_estimates': A list containing the reweighting estimates if reweighting=TRUE
, else (NA
)
- 'input_parameters': A list of input parameters used for the estimation.
References
Firpo, Sergio, Nicole M. Fortin, and Thomas Lemieux. 2018. "Decomposing Wage Distributions Using Recentered Influence Function Regressions." Econometrics, 6(2):28.
Fortin, Nicole, Thomas Lemieux, and Sergio Firpo. 2011. "Decomposition methods in economics." In Orley Ashenfelter and David Card, eds., Handbook of labor economics. Vol. 4. Elsevier, 1-102.
Gardeazabal, Javier, and Arantza Ugidos. 2004. "More on identification in detailed wage decompositions." Review of Economics and Statistics, 86(4): 1034-1036.
Examples
## Oaxaca-Blinder decomposition of gender wage gap
## with NLYS79 data like in Fortin, Lemieux, & Firpo (2011: 41)
data("nlys00")
mod1 <- log(wage) ~ age + central_city + msa + region + black +
hispanic + education + afqt + family_responsibility + years_worked_civilian +
years_worked_military + part_time + industry
# Using female coefficients (reference_0 = TRUE) to estimate counterfactual mean
decompose_female_as_reference <- ob_decompose(
formula = mod1,
data = nlys00,
group = female,
reference_0 = TRUE
)
decompose_female_as_reference
# Using male coefficients (reference_0 = FALSE)
decompose_male_as_reference <- ob_decompose(
formula = mod1,
data = nlys00,
group = female,
reference_0 = FALSE
)
decompose_male_as_reference
# Replicate first and third column in Table 3 in Fortin, Lemieux, & Firpo (2011: 41)
# Define aggregation of decomposition terms
custom_aggregation <- list(
`Age, race, region, etc.` = c(
"age",
"blackyes",
"hispanicyes",
"regionNorth-central",
"regionSouth",
"regionWest",
"central_cityyes",
"msayes"
),
`Education` = c(
"education<10 yrs",
"educationHS grad (diploma)",
"educationHS grad (GED)",
"educationSome college",
"educationBA or equiv. degree",
"educationMA or equiv. degree",
"educationPh.D or prof. degree"
),
`AFTQ` = "afqt",
`L.T. withdrawal due to family` = "family_responsibility",
`Life-time work experience` = c(
"years_worked_civilian",
"years_worked_military",
"part_time"
),
`Industrial sectors` = c(
"industryManufacturing",
"industryEducation, Health, Public Admin.",
"industryOther services"
)
)
# First column
summary(decompose_male_as_reference, custom_aggregation = custom_aggregation)
# Third column
summary(decompose_female_as_reference, custom_aggregation = custom_aggregation)
## Compare bootstrapped standard errors...
decompose_female_as_reference_bs <- ob_decompose(
formula = mod1,
data = nlys00,
group = female,
bootstrap = TRUE,
bootstrap_iterations = 100
)
summary(decompose_female_as_reference_bs, custom_aggregation = custom_aggregation)
# ... to analytical standard errors (assuming independence between groups and
# homoscedasticity)
decompose_female_as_reference <- ob_decompose(
formula = mod1,
data = nlys00,
group = female,
reference_0 = TRUE
)
summary(decompose_female_as_reference, custom_aggregation = custom_aggregation)
# Return standard errors for all detailed terms
summary(decompose_female_as_reference, aggregate_factors = FALSE)
## 'Doubly robust' Oaxaca-Blinder decomposition of gender wage gap
mod2 <- log(wage) ~ age + central_city + msa + region + black +
hispanic + education + afqt + family_responsibility + years_worked_civilian +
years_worked_military + part_time + industry | age + (central_city + msa) * region + (black +
hispanic) * (education + afqt) + family_responsibility * (years_worked_civilian +
years_worked_military) + part_time * industry
decompose_male_as_reference_robust <- ob_decompose(
formula = mod2,
data = nlys00,
group = female,
reference_0 = FALSE,
reweighting = TRUE
)
# ... using random forests instead of logit to estimate weights
decompose_male_as_reference_robust_rf <- ob_decompose(
formula = mod1,
data = nlys00,
group = female,
reference_0 = FALSE,
reweighting = TRUE,
method = "random_forest"
)
# Reweighted RIF Regression Decomposition
data("men8305")
model_rifreg <- log(wage) ~ union + education + experience |
union * (education + experience) + education * experience
# Variance
variance_decomposition <- ob_decompose(
formula = model_rifreg,
data = men8305,
group = year,
reweighting = TRUE,
rifreg_statistic = "variance"
)
# Deciles
deciles_decomposition <- ob_decompose(
formula = model_rifreg,
data = men8305,
group = year,
reweighting = TRUE,
rifreg_statistic = "quantiles",
rifreg_probs = c(1:9) / 10
)
# plot(deciles_decomposition)
# RIF regression decomposition with custom function
# custom function
custom_variance_function <- function(dep_var, weights, probs = NULL) {
weighted_mean <- weighted.mean(x = dep_var, w = weights)
rif <- (dep_var - weighted_mean)^2
rif <- data.frame(rif, weights)
names(rif) <- c("rif_variance", "weights")
return(rif)
}
custom_decomposition <-
ob_decompose(
formula = model_rifreg,
data = men8305,
group = year,
reweighting = TRUE,
rifreg_statistic = "custom",
custom_rif_function = custom_variance_function
)