calculate_effects {gKRLS}R Documentation

Marginal Effects

Description

These functions calculate marginal effects or predicted values after estimating a model with gam or bam.

Usage

calculate_effects(
  model,
  data = NULL,
  variables = NULL,
  continuous_type = c("IQR", "minmax", "derivative", "onesd", "predict",
    "second_derivative"),
  conditional = NULL,
  individual = FALSE,
  vcov = NULL,
  raw = FALSE,
  use_original = FALSE,
  epsilon = 1e-07,
  verbose = FALSE
)

calculate_interactions(
  model,
  variables,
  QOI = c("AMIE", "ACE", "AME", "AIE"),
  ...
)

get_individual_effects(x)

## S3 method for class 'gKRLS_mfx'
print(x, ...)

## S3 method for class 'gKRLS_mfx'
summary(object, ...)

Arguments

model

A model estimated using functions from mgcv (e.g., gam or bam).

data

A data frame that is used to calculate the marginal effect or set to NULL which will employ the data used when estimating the model. The default is NULL. Using a custom dataset may have unexpected implications for continuous and character/factor variables. See "WARNINGS" for more discussion.

variables

A character vector that specifies the variables for which to calculate effects. The default, NULL, calculates effects for all variables.

continuous_type

A character value, with a default of "IQR", that indicates the type of marginal effects to estimate when the variable is continuous (i.e. not binary, logical, factor, or character). Options are "IQR" (compares the variable at its 25% and 75% percentile), "minmax" (compares the variable at its minimum and maximum), "derivative" (numerically approximates the derivative at each observed value), "second_derivative" (numerically approximates the second derivative at each observed value), "onesd" (compares one standard deviation below and one standard deviation above the mean of the variable). It also accepts a named list where each named element corresponds to a continuous variable and has a two-length vector as each element. The two values are then compared. If this is used, then all continuous variables must have two values specified.

A special option ("predict") produces predictions (e.g., predict(model, type = "response")) at each observed value and then averages them together. This, in conjunction with conditional, provides a way of calculating quantities such as predicted probability curves using an "observed value" approach (e.g., Hanmer and Kalkan 2013). Examples are provided below.

conditional

A data.frame or NULL. This is an analogue of Stata's at() option and the at argument in the margins package. For a marginal effect on some variable "a", this specifies fixed values for certain other covariates, e.g. data.frame("b" = 0). If conditional is NULL, all other covariates are held at their observed value. If conditional is a data.frame, then each row represents a different combination of covariate values to be held fixed, and marginal effects are calculated separately for each row. Examples are provided below.

individual

A logical value. TRUE calculates individual effects (i.e. an effect for each observation in the data). The default is FALSE.

vcov

A matrix that specifies the covariance matrix of the parameters. The default, NULL, uses the standard covariance matrix from mgcv. This can be used to specify clustered or robust standard errors using output from (for example) sandwich.

raw

A logical value. TRUE returns the raw values used to calculate the effect in addition to the estimated effect. The default is FALSE. If TRUE, an additional column ...id is present in the estimated effects that reports whether the row corresponds to the effect (effect), the first value (raw_0) or the second value (raw_1) where effect=raw_1 - raw_0. For "derivative", this is further scaled by the step size. For "second_derivative", effect=raw_2 - 2 * raw_1 + raw_0, scaled by the step size; see the discussion for epsilon for how the step size is calculated.

use_original

A logical value that indicates whether to use the estimation data (TRUE) or data (FALSE) when calculating quantities such as the IQR for continuous variables or the levels to examine for factor variables. Default (FALSE) uses the provided data; if data = NULL, this is equivalent to using the estimation data. The "WARNINGS" section provides more discussion of this option.

epsilon

A numerical value that defines the step size when calculating numerical derivatives (default of 1e-7). For "derivative", the step size for the approximation is h = \epsilon \cdot \mathrm{max}(1, \mathrm{max}(|x|)), i.e. f'(x) \approx \frac{f(x+h) - f(x-h)}{2h}. Please see Leeper (2016) for more details.

For "second_derivative", the step size is h = [\epsilon \cdot \mathrm{max}(1, \mathrm{max}(|x|))]^{0.5}, i.e. f''(x) \approx \frac{f(x+h) - 2 f(x) + f(x-h)}{h^2}

verbose

A logical value that indicates whether to report progress when calculating the marginal effects. The default is FALSE.

QOI

A vector of quantities of interest calculate for calculate_interactions. Options include "AME" (average marginal effect), "ACE" (average combination effect), "AIE" (average interaction effect) and "AMIE" (average marginal interaction effect); see "Details" for more information. The default setting calculates all four quantities.

...

An argument used for calculate_interactions to pass arguments to calculate_effects. It is unused for summary.gKRLS_mfx.

x

An object estimated using calculate_effects.

object

A model estimated using functions from mgcv (e.g., gam or bam).

Details

Overview: calculate_effects returns a data.frame of class "gKRLS_mfx" that reports the estimated average marginal effects and standard errors. Other columns include "type" that reports the type of marginal effect calculated. For families with multiple predicted outcomes (e.g., multinomial), the column "response" numbers the different outcomes in the same order as predict.gam(object) for the specified family. Many (but not all) extended and generalized families from mgcv are included.

The conditional argument while setting continuous_type = "predict" can be used to estimate predicted values at different covariate strata (e.g., to create an "observed value" predicted probability curve for a logistic regression). The examples provide an illustration.

Interactions: calculate_interactions provides some simple functions for calculating interaction effects between variables. The default quantities it can produce are listed below. Egami and Imai (2019) provide a detailed exposition of these quantities. All marginalization is done using an "observed value" approach, i.e. over the estimation data or a custom dataset provided to data.

Other Functions: get_individual_effects extracts the individual-level effects that are estimated if individual=TRUE.

Value

Both calculate_effects and calculate_interactions return data.frames. calculate_effects contains attributes—including the ones noted below—that may be useful for other analyses.

WARNINGS

Using a custom dataset for data, i.e. a dataset other than the estimation data, may have unexpected implications. For continuous and character/factor variables, the estimated marginal effects may depend on the distribution of the variable in data. For example, if continuous_type="IQR", the variable x1 is counterfactually set to quantile(data$x1, 0.25) and quantile(data$x1, 0.75) where data is provided by calculate_effects (versus the estimation data). To force this range to be set based on the estimation data, set use_original=TRUE.

This default behavior if data is provided may be undesirable and thus calculate_effects will issue a warning if this situation arises and a custom data is provided. These settings are subject to change in future releases.

References

Egami, Naoki and Kosuke Imai. 2019. "Causal Interaction in Factorial Experiments: Application to Conjoint Analysis." Journal of the American Statistical Association. 114(526):529-540.

Hanmer, Michael J. and Kerem Ozan Kalkan. 2013. "Behind the Curve: Clarifying the Best Approach to Calculating Predicted Probabilities and Marginal Effects from Limited Dependent Variable Models." American Journal of Political Science 57(1): 263-277.

Leeper, Thomas J. 2016. "Interpreting Regression Results using Average Marginal Effects with R's margins." Working paper available at https://s3.us-east-2.amazonaws.com/tjl-sharing/assets/AverageMarginalEffects.pdf.

Examples

set.seed(654)
n <- 50
x1 <- rnorm(n)
x2 <- rnorm(n)
x3 <- rnorm(n)
state <- sample(letters[1:5], n, replace = TRUE)
y <- 0.3 * x1 + 0.4 * x2 + 0.5 * x3 + rnorm(n)
data <- data.frame(y, x1, x2, x3, state)

# Make character variables into factors for mgcv
data$state <- factor(data$state)

# A gKRLS model
fit_gKRLS <- mgcv::gam(y ~ state + s(x1, x2, x3, bs = "gKRLS"), data = data)

# calculate marginal effect using derivative
calculate_effects(fit_gKRLS, variables = "x1", continuous_type = "derivative")

# calculate marginal effect by specifying conditional variables
calculate_effects(fit_gKRLS,
  variables = "x1",
  conditional = data.frame(x2 = c(0.6, 0.8), x3 = 0.3)
)

# calculate interaction effects between two variables
# use the default setting ("IQR") for the baseline and
# comparison categories for each variable
calculate_interactions(fit_gKRLS,
   variables = list(c("x1", "x2")),
   QOI = c('AIE', 'AMIE')
)

# calculate marginal effect by specifying a factor conditional variable
# estimate the individual marginal effects
out <- calculate_effects(fit_gKRLS,
  variables = "x1", individual = TRUE,
  conditional = data.frame(state = c("a", "b", "c")), continuous_type = "derivative"
)

# Extract the individual marginal effects:
# shorthand for attr(fit_main, 'individual')
get_individual_effects(out)

# calculated the average expected value across a grid of "x1"
# using an observed value approach for the other covariates
calculate_effects(fit_gKRLS, conditional = data.frame(x1 = c(0, 0.2, 0.4, 0.6)),
  continuous_type = 'predict'
)

[Package gKRLS version 1.0.2 Index]