R: Marginal Effects

calculate_effects {gKRLS}

R Documentation

Marginal Effects

Description

These functions calculate marginal effects or predicted values after estimating a model with gam or bam.

Usage

calculate_effects(
  model,
  data = NULL,
  variables = NULL,
  continuous_type = c("IQR", "minmax", "derivative", "onesd", "predict",
    "second_derivative"),
  conditional = NULL,
  individual = FALSE,
  vcov = NULL,
  raw = FALSE,
  use_original = FALSE,
  epsilon = 1e-07,
  verbose = FALSE
)

calculate_interactions(
  model,
  variables,
  QOI = c("AMIE", "ACE", "AME", "AIE"),
  ...
)

get_individual_effects(x)

## S3 method for class 'gKRLS_mfx'
print(x, ...)

## S3 method for class 'gKRLS_mfx'
summary(object, ...)

Arguments

`model`	A model estimated using functions from `mgcv` (e.g., `gam` or `bam`).
`data`	A data frame that is used to calculate the marginal effect or set to `NULL` which will employ the data used when estimating the model. The default is `NULL`. Using a custom dataset may have unexpected implications for continuous and character/factor variables. See "WARNINGS" for more discussion.
`variables`	A character vector that specifies the variables for which to calculate effects. The default, `NULL`, calculates effects for all variables.
`continuous_type`	A character value, with a default of `"IQR"`, that indicates the type of marginal effects to estimate when the variable is continuous (i.e. not binary, logical, factor, or character). Options are `"IQR"` (compares the variable at its 25% and 75% percentile), `"minmax"` (compares the variable at its minimum and maximum), `"derivative"` (numerically approximates the derivative at each observed value), `"second_derivative"` (numerically approximates the second derivative at each observed value), `"onesd"` (compares one standard deviation below and one standard deviation above the mean of the variable). It also accepts a named list where each named element corresponds to a continuous variable and has a two-length vector as each element. The two values are then compared. If this is used, then all continuous variables must have two values specified. A special option (`"predict"`) produces predictions (e.g., `predict(model, type = "response")`) at each observed value and then averages them together. This, in conjunction with `conditional`, provides a way of calculating quantities such as predicted probability curves using an "observed value" approach (e.g., Hanmer and Kalkan 2013). Examples are provided below.
`conditional`	A data.frame or `NULL`. This is an analogue of Stata's `at()` option and the `at` argument in the `margins` package. For a marginal effect on some variable `"a"`, this specifies fixed values for certain other covariates, e.g. `data.frame("b" = 0)`. If `conditional` is `NULL`, all other covariates are held at their observed value. If `conditional` is a data.frame, then each row represents a different combination of covariate values to be held fixed, and marginal effects are calculated separately for each row. Examples are provided below.
`individual`	A logical value. `TRUE` calculates individual effects (i.e. an effect for each observation in the `data`). The default is `FALSE`.
`vcov`	A matrix that specifies the covariance matrix of the parameters. The default, `NULL`, uses the standard covariance matrix from `mgcv`. This can be used to specify clustered or robust standard errors using output from (for example) `sandwich`.
`raw`	A logical value. `TRUE` returns the raw values used to calculate the effect in addition to the estimated effect. The default is `FALSE`. If `TRUE`, an additional column `...id` is present in the estimated effects that reports whether the row corresponds to the effect (`effect`), the first value (`raw_0`) or the second value (`raw_1`) where `effect=raw_1 - raw_0`. For `"derivative"`, this is further scaled by the step size. For `"second_derivative"`, `effect=raw_2 - 2 * raw_1 + raw_0`, scaled by the step size; see the discussion for `epsilon` for how the step size is calculated.
`use_original`	A logical value that indicates whether to use the estimation data (`TRUE`) or `data` (`FALSE`) when calculating quantities such as the IQR for continuous variables or the levels to examine for factor variables. Default (`FALSE`) uses the provided data; if `data = NULL`, this is equivalent to using the estimation data. The "WARNINGS" section provides more discussion of this option.
`epsilon`	A numerical value that defines the step size when calculating numerical derivatives (default of 1e-7). For `"derivative"`, the step size for the approximation is `h = \epsilon \cdot \mathrm{max}(1, \mathrm{max}(\|x\|))`, i.e. `f'(x) \approx \frac{f(x+h) - f(x-h)}{2h}`. Please see Leeper (2016) for more details. For `"second_derivative"`, the step size is `h = [\epsilon \cdot \mathrm{max}(1, \mathrm{max}(\|x\|))]^{0.5}`, i.e. `f''(x) \approx \frac{f(x+h) - 2 f(x) + f(x-h)}{h^2}`
`verbose`	A logical value that indicates whether to report progress when calculating the marginal effects. The default is `FALSE`.
`QOI`	A vector of quantities of interest calculate for `calculate_interactions`. Options include `"AME"` (average marginal effect), `"ACE"` (average combination effect), `"AIE"` (average interaction effect) and `"AMIE"` (average marginal interaction effect); see "Details" for more information. The default setting calculates all four quantities.
`...`	An argument used for `calculate_interactions` to pass arguments to `calculate_effects`. It is unused for `summary.gKRLS_mfx`.
`x`	An object estimated using `calculate_effects`.
`object`	A model estimated using functions from `mgcv` (e.g., `gam` or `bam`).

Details

Overview: calculate_effects returns a data.frame of class "gKRLS_mfx" that reports the estimated average marginal effects and standard errors. Other columns include "type" that reports the type of marginal effect calculated. For families with multiple predicted outcomes (e.g., multinomial), the column "response" numbers the different outcomes in the same order as predict.gam(object) for the specified family. Many (but not all) extended and generalized families from mgcv are included.

The conditional argument while setting continuous_type = "predict" can be used to estimate predicted values at different covariate strata (e.g., to create an "observed value" predicted probability curve for a logistic regression). The examples provide an illustration.

Interactions: calculate_interactions provides some simple functions for calculating interaction effects between variables. The default quantities it can produce are listed below. Egami and Imai (2019) provide a detailed exposition of these quantities. All marginalization is done using an "observed value" approach, i.e. over the estimation data or a custom dataset provided to data.

"AME" or Average Marginal Effect: This is the standard quantity reported from calculate_effects.
"ACE" or Average Combination Effect: This is the effect of changing two variables simultaneously on the outcome.
"AMIE" or Average Marginal Interaction Effect: This is ACE minus each corresponding AME.
"AIE" or Average Interaction Effect: This has a "conditional effect" interpretation and reports the difference in average effect of one variable ("A") between two different levels of a second variable ("B").

Other Functions: get_individual_effects extracts the individual-level effects that are estimated if individual=TRUE.

Value

Both calculate_effects and calculate_interactions return data.frames. calculate_effects contains attributes—including the ones noted below—that may be useful for other analyses.

"gradient": This contains the gradients used to calculate the standard error (via the delta method) for the estimates from calculate_effects. There is one column for each quantity calculated in the main object. The format of this object depends on the family used for gam or bam. This could be used manually to calculate a standard error on the difference between two estimated marginal effects.
"N_eff": The number of observations (in the estimation data) minus the effective degrees of freedom. This is used when calculating p-values as the degrees of freedom for the t-distribution.
"N": The number of observations.

WARNINGS

Using a custom dataset for data, i.e. a dataset other than the estimation data, may have unexpected implications. For continuous and character/factor variables, the estimated marginal effects may depend on the distribution of the variable in data. For example, if continuous_type="IQR", the variable x1 is counterfactually set to quantile(data$x1, 0.25) and quantile(data$x1, 0.75) where data is provided by calculate_effects (versus the estimation data). To force this range to be set based on the estimation data, set use_original=TRUE.

This default behavior if data is provided may be undesirable and thus calculate_effects will issue a warning if this situation arises and a custom data is provided. These settings are subject to change in future releases.

References

Egami, Naoki and Kosuke Imai. 2019. "Causal Interaction in Factorial Experiments: Application to Conjoint Analysis." Journal of the American Statistical Association. 114(526):529-540.

Hanmer, Michael J. and Kerem Ozan Kalkan. 2013. "Behind the Curve: Clarifying the Best Approach to Calculating Predicted Probabilities and Marginal Effects from Limited Dependent Variable Models." American Journal of Political Science 57(1): 263-277.

Leeper, Thomas J. 2016. "Interpreting Regression Results using Average Marginal Effects with R's margins." Working paper available at https://s3.us-east-2.amazonaws.com/tjl-sharing/assets/AverageMarginalEffects.pdf.

Examples

set.seed(654)
n <- 50
x1 <- rnorm(n)
x2 <- rnorm(n)
x3 <- rnorm(n)
state <- sample(letters[1:5], n, replace = TRUE)
y <- 0.3 * x1 + 0.4 * x2 + 0.5 * x3 + rnorm(n)
data <- data.frame(y, x1, x2, x3, state)

# Make character variables into factors for mgcv
data$state <- factor(data$state)

# A gKRLS model
fit_gKRLS <- mgcv::gam(y ~ state + s(x1, x2, x3, bs = "gKRLS"), data = data)

# calculate marginal effect using derivative
calculate_effects(fit_gKRLS, variables = "x1", continuous_type = "derivative")

# calculate marginal effect by specifying conditional variables
calculate_effects(fit_gKRLS,
  variables = "x1",
  conditional = data.frame(x2 = c(0.6, 0.8), x3 = 0.3)
)

# calculate interaction effects between two variables
# use the default setting ("IQR") for the baseline and
# comparison categories for each variable
calculate_interactions(fit_gKRLS,
   variables = list(c("x1", "x2")),
   QOI = c('AIE', 'AMIE')
)

# calculate marginal effect by specifying a factor conditional variable
# estimate the individual marginal effects
out <- calculate_effects(fit_gKRLS,
  variables = "x1", individual = TRUE,
  conditional = data.frame(state = c("a", "b", "c")), continuous_type = "derivative"
)

# Extract the individual marginal effects:
# shorthand for attr(fit_main, 'individual')
get_individual_effects(out)

# calculated the average expected value across a grid of "x1"
# using an observed value approach for the other covariates
calculate_effects(fit_gKRLS, conditional = data.frame(x1 = c(0, 0.2, 0.4, 0.6)),
  continuous_type = 'predict'
)

[Package gKRLS version 1.0.2 Index]