xpd {fixest} | R Documentation |
Expands formula macros
Description
Create macros within formulas and expand them with character vectors or other formulas.
Usage
xpd(fml, ..., add = NULL, lhs, rhs, data = NULL, frame = parent.frame())
Arguments
fml |
A formula containing macros variables. Each macro variable must start with two dots.
The macro variables can be set globally using |
... |
Definition of the macro variables. Each argument name corresponds to the name of the
macro variable. It is required that each macro variable name starts with two dots
(e.g. |
add |
Either a character scalar or a one-sided formula. The elements will be added to the right-hand-side of the formula, before any macro expansion is applied. |
lhs |
If present then a formula will be constructed with |
rhs |
If present, then a formula will be constructed with |
data |
Either a character vector or a data.frame. This argument will only be used if a
macro of the type |
frame |
The environment containing the values to be expanded with the dot square bracket
operator. Default is |
Details
In xpd
, the default macro variables are taken from getFixest_fml
. Any value in the ...
argument of xpd
will replace these default values.
The definitions of the macro variables will replace in verbatim the macro variables. Therefore,
you can include multi-part formulas if you wish but then beware of the order of the macros
variable in the formula. For example, using the airquality
data, say you want to set as
controls the variable Temp
and Day
fixed-effects, you can do
setFixest_fml(..ctrl = ~Temp | Day)
, but then feols(Ozone ~ Wind + ..ctrl, airquality)
will be quite different from feols(Ozone ~ ..ctrl + Wind, airquality)
, so beware!
Value
It returns a formula where all macros have been expanded.
Dot square bracket operator in formulas
In a formula, the dot square bracket (DSB) operator can: i) create manifold variables at once, or ii) capture values from the current environment and put them verbatim in the formula.
Say you want to include the variables x1
to x3
in your formula. You can use
xpd(y ~ x.[1:3])
and you'll get y ~ x1 + x2 + x3
.
To summon values from the environment, simply put the variable in square brackets. For example:
for(i in 1:3) xpd(y.[i] ~ x)
will create the formulas y1 ~ x
to y3 ~ x
depending on the
value of i
.
You can include a full variable from the environment in the same way:
for(y in c("a", "b")) xpd(.[y] ~ x)
will create the two formulas a ~ x
and b ~ x
.
The DSB can even be used within variable names, but then the variable must be nested in
character form. For example y ~ .["x.[1:2]_sq"]
will create y ~ x1_sq + x2_sq
. Using the
character form is important to avoid a formula parsing error. Double quotes must be used. Note
that the character string that is nested will be parsed with the function dsb
, and thus it
will return a vector.
By default, the DSB operator expands vectors into sums. You can add a comma, like in .[, x]
,
to expand with commas–the content can then be used within functions. For instance:
c(x.[, 1:2])
will create c(x1, x2)
(and not c(x1 + x2)
).
In all fixest
estimations, this special parsing is enabled, so you don't need to use xpd
.
One-sided formulas can be expanded with the DSB operator: let x = ~sepal + petal
, then
xpd(y ~ .[x])
leads to color ~ sepal + petal
.
You can even use multiple square brackets within a single variable, but then the use of nesting
is required. For example, the following xpd(y ~ .[".[letters[1:2]]_.[1:2]"])
will create
y ~ a_1 + b_2
. Remember that the nested character string is parsed with dsb
,
which explains this behavior.
When the element to be expanded i) is equal to the empty string or, ii) is of length 0, it is
replaced with a neutral element, namely 1
. For example, x = "" ; xpd(y ~ .[x])
leads to
y ~ 1
.
Regular expressions
You can catch several variable names at once by using regular expressions. To use regular
expressions, you need to enclose it in the dot-dot or the regex function: ..("regex")
or
regex("regex")
. For example, regex("Sepal")
will catch both the variables Sepal.Length
and
Sepal.Width
from the iris
data set. In a fixest
estimation, the variables names from which
the regex will be applied come from the data set. If you use xpd
, you need to provide either a
data set or a vector of names in the argument data
.
By default the variables are aggregated with a sum. For example in a data set with the variables
x1 to x10, regex("x(1|2)"
will yield x1 + x2 + x10
. You can instead ask for "comma"
aggregation by using a comma first, just before the regular expression:
y ~ sw(regex(,"x(1|2)"))
would lead to y ~ sw(x1, x2, x10)
.
Note that the dot square bracket operator (DSB, see before) is applied before the regular
expression is evaluated. This means that regex("x.[3:4]_sq")
will lead, after evaluation of
the DSB, to regex("x3_sq|x4_sq")
. It is a handy way to insert range of numbers in a regular
expression.
Author(s)
Laurent Berge
See Also
setFixest_fml
to set formula macros, and dsb
to modify character strings with the DSB operator.
Examples
# Small examples with airquality data
data(airquality)
# we set two macro variables
setFixest_fml(..ctrl = ~ Temp + Day,
..ctrl_long = ~ poly(Temp, 2) + poly(Day, 2))
# Using the macro in lm with xpd:
lm(xpd(Ozone ~ Wind + ..ctrl), airquality)
lm(xpd(Ozone ~ Wind + ..ctrl_long), airquality)
# You can use the macros without xpd() in fixest estimations
a = feols(Ozone ~ Wind + ..ctrl, airquality)
b = feols(Ozone ~ Wind + ..ctrl_long, airquality)
etable(a, b, keep = "Int|Win")
# Using .[]
base = setNames(iris, c("y", "x1", "x2", "x3", "species"))
i = 2:3
z = "species"
lm(xpd(y ~ x.[2:3] + .[z]), base)
# No xpd() needed in feols
feols(y ~ x.[2:3] + .[z], base)
#
# Auto completion with '..' suffix
#
# You can trigger variables autocompletion with the '..' suffix
# You need to provide the argument data
base = setNames(iris, c("y", "x1", "x2", "x3", "species"))
xpd(y ~ x.., data = base)
# In fixest estimations, this is automatically taken care of
feols(y ~ x.., data = base)
#
# You can use xpd for stepwise estimations
#
# Note that for stepwise estimations in fixest, you can use
# the stepwise functions: sw, sw0, csw, csw0
# -> see help in feols or in the dedicated vignette
# we want to look at the effect of x1 on y
# controlling for different variables
base = iris
names(base) = c("y", "x1", "x2", "x3", "species")
# We first create a matrix with all possible combinations of variables
my_args = lapply(names(base)[-(1:2)], function(x) c("", x))
(all_combs = as.matrix(do.call("expand.grid", my_args)))
res_all = list()
for(i in 1:nrow(all_combs)){
res_all[[i]] = feols(xpd(y ~ x1 + ..v, ..v = all_combs[i, ]), base)
}
etable(res_all)
coefplot(res_all, group = list(Species = "^^species"))
#
# You can use macros to grep variables in your data set
#
# Example 1: setting a macro variable globally
data(longley)
setFixest_fml(..many_vars = grep("GNP|ployed", names(longley), value = TRUE))
feols(Armed.Forces ~ Population + ..many_vars, longley)
# Example 2: using ..("regex") or regex("regex") to grep the variables "live"
feols(Armed.Forces ~ Population + ..("GNP|ployed"), longley)
# Example 3: same as Ex.2 but without using a fixest estimation
# Here we need to use xpd():
lm(xpd(Armed.Forces ~ Population + regex("GNP|ployed"), data = longley), longley)
# Stepwise estimation with regex: use a comma after the parenthesis
feols(Armed.Forces ~ Population + sw(regex(,"GNP|ployed")), longley)
# Multiple LHS
etable(feols(..("GNP|ployed") ~ Population, longley))
#
# lhs and rhs arguments
#
# to create a one sided formula from a character vector
vars = letters[1:5]
xpd(rhs = vars)
# Alternatively, to replace the RHS
xpd(y ~ 1, rhs = vars)
# To create a two sided formula
xpd(lhs = "y", rhs = vars)
#
# argument 'add'
#
xpd(~x1, add = ~ x2 + x3)
# also works with character vectors
xpd(~x1, add = c("x2", "x3"))
# only adds to the RHS
xpd(y ~ x, add = ~bon + jour)
#
# Dot square bracket operator
#
# The basic use is to add variables in the formula
x = c("x1", "x2")
xpd(y ~ .[x])
# Alternatively, one-sided formulas can be used and their content will be inserted verbatim
x = ~x1 + x2
xpd(y ~ .[x])
# You can create multiple variables at once
xpd(y ~ x.[1:5] + z.[2:3])
# You can summon variables from the environment to complete variables names
var = "a"
xpd(y ~ x.[var])
# ... the variables can be multiple
vars = LETTERS[1:3]
xpd(y ~ x.[vars])
# You can have "complex" variable names but they must be nested in character form
xpd(y ~ .["x.[vars]_sq"])
# DSB can be used within regular expressions
re = c("GNP", "Pop")
xpd(Unemployed ~ regex(".[re]"), data = longley)
# => equivalent to regex("GNP|Pop")
# Use .[,var] (NOTE THE COMMA!) to expand with commas
# !! can break the formula if missused
vars = c("wage", "unemp")
xpd(c(y.[,1:3]) ~ csw(.[,vars]))
# Example of use of .[] within a loop
res_all = list()
for(p in 1:3){
res_all[[p]] = feols(Ozone ~ Wind + poly(Temp, .[p]), airquality)
}
etable(res_all)
# The former can be compactly estimated with:
res_compact = feols(Ozone ~ Wind + sw(.[, "poly(Temp, .[1:3])"]), airquality)
etable(res_compact)
# How does it work?
# 1) .[, stuff] evaluates stuff and, if a vector, aggregates it with commas
# Comma aggregation is done thanks to the comma placed after the square bracket
# If .[stuff], then aggregation is with sums.
# 2) stuff is evaluated, and if it is a character string, it is evaluated with
# the function dsb which expands values in .[]
#
# Wrapping up:
# 2) evaluation of dsb("poly(Temp, .[1:3])") leads to the vector:
# c("poly(Temp, 1)", "poly(Temp, 2)", "poly(Temp, 3)")
# 1) .[, c("poly(Temp, 1)", "poly(Temp, 2)", "poly(Temp, 3)")] leads to
# poly(Temp, 1), poly(Temp, 2), poly(Temp, 3)
#
# Hence sw(.[, "poly(Temp, .[1:3])"]) becomes:
# sw(poly(Temp, 1), poly(Temp, 2), poly(Temp, 3))
#
# In non-fixest functions: guessing the data allows to use regex
#
# When used in non-fixest functions, the algorithm tries to "guess" the data
# so that ..("regex") can be directly evaluated without passing the argument 'data'
data(longley)
lm(xpd(Armed.Forces ~ Population + ..("GNP|ployed")), longley)
# same for the auto completion with '..'
lm(xpd(Armed.Forces ~ Population + GN..), longley)