demean {fixest} | R Documentation |
Centers a set of variables around a set of factors
Description
User-level access to internal demeaning algorithm of fixest
.
Usage
demean(
X,
f,
slope.vars,
slope.flag,
data,
weights,
nthreads = getFixest_nthreads(),
notes = getFixest_notes(),
iter = 2000,
tol = 1e-06,
fixef.reorder = TRUE,
fixef.algo = NULL,
na.rm = TRUE,
as.matrix = is.atomic(X),
im_confident = FALSE,
...
)
Arguments
X |
A matrix, vector, data.frame or a list OR a formula OR a |
f |
A matrix, vector, data.frame or list. The factors used to center the variables in
argument |
slope.vars |
A vector, matrix or list representing the variables with varying slopes.
Matrices will be coerced using |
slope.flag |
An integer vector of the same length as the number of variables in |
data |
A data.frame containing all variables in the argument |
weights |
Vector, can be missing or NULL. If present, it must contain the same number of
observations as in |
nthreads |
Number of threads to be used. By default it is equal to |
notes |
Logical, whether to display a message when NA values are removed. By default it is
equal to |
iter |
Number of iterations, default is 2000. |
tol |
Stopping criterion of the algorithm. Default is |
fixef.reorder |
Logical, default is |
fixef.algo |
|
na.rm |
Logical, default is |
as.matrix |
Logical, if |
im_confident |
Logical, default is |
... |
Not currently used. |
Value
It returns a data.frame of the same number of columns as the number of variables to be centered.
If na.rm = TRUE
, then the number of rows is equal to the number of rows in input minus the
number of NA values (contained in X
, f
, slope.vars
or weights
). The default is to have
an output of the same number of observations as the input (filled with NAs where appropriate).
A matrix can be returned if as.matrix = TRUE
.
Varying slopes
You can add variables with varying slopes in the fixed-effect part of the formula.
The syntax is as follows: fixef_var[var1, var2]
. Here the variables var1 and var2 will
be with varying slopes (one slope per value in fixef_var) and the fixed-effect
fixef_var will also be added.
To add only the variables with varying slopes and not the fixed-effect,
use double square brackets: fixef_var[[var1, var2]]
.
In other words:
-
fixef_var[var1, var2]
is equivalent tofixef_var + fixef_var[[var1]] + fixef_var[[var2]]
-
fixef_var[[var1, var2]]
is equivalent tofixef_var[[var1]] + fixef_var[[var2]]
In general, for convergence reasons, it is recommended to always add the fixed-effect and avoid using only the variable with varying slope (i.e. use single square brackets).
Examples
# Illustration of the FWL theorem
data(trade)
base = trade
base$ln_dist = log(base$dist_km)
base$ln_euros = log(base$Euros)
# We center the two variables ln_dist and ln_euros
# on the factors Origin and Destination
X_demean = demean(X = base[, c("ln_dist", "ln_euros")],
f = base[, c("Origin", "Destination")])
base[, c("ln_dist_dm", "ln_euros_dm")] = X_demean
est = feols(ln_euros_dm ~ ln_dist_dm, base)
est_fe = feols(ln_euros ~ ln_dist | Origin + Destination, base)
# The results are the same as if we used the two factors
# as fixed-effects
etable(est, est_fe, se = "st")
#
# Variables with varying slopes
#
# You can center on factors but also on variables with varying slopes
# Let's have an illustration
base = iris
names(base) = c("y", "x1", "x2", "x3", "species")
#
# We center y and x1 on species and x2 * species
# using a formula
base_dm = demean(y + x1 ~ species[x2], data = base)
# using vectors
base_dm_bis = demean(X = base[, c("y", "x1")], f = base$species,
slope.vars = base$x2, slope.flag = 1)
# Let's look at the equivalences
res_vs_1 = feols(y ~ x1 + species + x2:species, base)
res_vs_2 = feols(y ~ x1, base_dm)
res_vs_3 = feols(y ~ x1, base_dm_bis)
# only the small sample adj. differ in the SEs
etable(res_vs_1, res_vs_2, res_vs_3, keep = "x1")
#
# center on x2 * species and on another FE
base$fe = rep(1:5, 10)
# using a formula => double square brackets!
base_dm = demean(y + x1 ~ fe + species[[x2]], data = base)
# using vectors => note slope.flag!
base_dm_bis = demean(X = base[, c("y", "x1")], f = base[, c("fe", "species")],
slope.vars = base$x2, slope.flag = c(0, -1))
# Explanations slope.flag = c(0, -1):
# - the first 0: the first factor (fe) is associated to no variable
# - the "-1":
# * |-1| = 1: the second factor (species) is associated to ONE variable
# * -1 < 0: the second factor should not be included as such
# Let's look at the equivalences
res_vs_1 = feols(y ~ x1 + i(fe) + x2:species, base)
res_vs_2 = feols(y ~ x1, base_dm)
res_vs_3 = feols(y ~ x1, base_dm_bis)
# only the small sample adj. differ in the SEs
etable(res_vs_1, res_vs_2, res_vs_3, keep = "x1")