margins {PivotalR} R Documentation

## Compute the marginal effects of regression models

### Description

`margins` calculates the marginal effects of the variables given the result of regressions (`madlib.lm`, `madlib.glm` etc). `Vars` lists all the variables used in the regression model. `Terms` lists the specified terms in the original model. `Vars` and `Terms` are only used in `margins`'s `dydx` option.

### Usage

```## S3 method for class 'lm.madlib'
margins(model, dydx = ~Vars(model), newdata =
model\$data, at.mean = FALSE, factor.continuous = FALSE, na.action =
NULL, ...)

## S3 method for class 'lm.madlib.grps'
margins(model, dydx = ~Vars(model), newdata =
lapply(model, function(x) x\$data), at.mean = FALSE, factor.continuous =
FALSE, na.action = NULL, ...)

## S3 method for class 'logregr.madlib'
margins(model, dydx = ~Vars(model), newdata =
model\$data, at.mean = FALSE, factor.continuous = FALSE, na.action =
NULL, ...)

## S3 method for class 'logregr.madlib.grps'
margins(model, dydx = ~Vars(model),
newdata = lapply(model, function(x) x\$data), at.mean = FALSE,
factor.continuous = FALSE, na.action = NULL, ...)

## S3 method for class 'margins'
print(x, digits = max(3L, getOption("digits") - 3L),
...)

Vars(model)

Terms(term = NULL)
```

### Arguments

 `model` The result of `madlib.lm`, `madlib.glm`, which represents a regression model for the training data. `dydx` A formula, and the default is `~ Vars(model)`, which tells the function to compute the marginal effects for all the variables that appear in the model. `~ .` will compute the marginal effects of all variables in `newdata`. Use the normal formula to specify which variables' marginal effects are to be computed. `newdata` A `db.obj` object, which represents the data in the database. The default is the data used to train the regression model, but the user can freely use other data sets. `at.mean` A logical, the default is `FALSE`. Whether to compute the marginal effects at the mean values of the variables. `factor.continuous` A logical, the default is `FALSE`. Whether to compute the marginal effects of factors by treating them as continuous variables. See "details" for more explanation. `na.action` A string which indicates what should happen when the data contain `NA`s. Possible values include `na.omit`, `"na.exclude"`, `"na.fail"` and `NULL`. Right now, `na.omit,db.obj-method` has been implemented. When the value is `NULL`, nothing is done on the R side and `NA` values are filtered out and omitted on the MADlib side. User defined `na.action` function is allowed, and see `na.omit,db.obj-method` for the preferred function interface. `...` Other arguments, not implemented. `x` The result of `margins` function, which is of the class "margins". `digits` A non-null value for ‘digits’ specifies the minimum number of significant digits to be printed in values. The default, ‘NULL’, uses ‘getOption("digits")’. (For the interpretation for complex numbers see `signif`.) Non-integer values will be rounded down, and only values greater than or equal to 1 and no greater than 22 are accepted. `term` A vector of integers, the default is `NULL`. When `term=i`, compute the marginal effects of the i-th term. Even if this term contains multiple variables, we treat it as a variable independent of all others. When `term=NULL`, the marginal effects of all terms are calculated. In the final result, margianl effect results for `".term.1"`, `".term.2"` etc will be shown. By comparing with `names(model\$coef)`, one can easily figure out which term corresponds to which expression. `(Intercept)` term's marginal effect cannot be computed using this (One can create an extra column that equals 1 and use it as a variable without using intercept by add -1 into the fitting formula).

### Details

For a continuous variable, its marginal effects is just the first derivative of the response function with respect to the variable. For a categorical variable, it is usually more meaningful to compute the finite difference of the response function for the variable being 1 and 0. The finite difference marginal effect measures how much more the response function would be compared with the reference category. The reference category for a categorical variable can be changed by `relevel`.

### Value

`margins` function returns a `margins` object, which is a `data.frame`. It contains the following item:

 `Estimate` The marginal effect values for all variable that have been specified in `dydx`. `Std. Error` The standard errors for the marginal effects. `t value, z value` The t statistics (for linear regression) or z statistics (for logistic regression). `Pr(>|t|), Pr(>|z|)` The corresponding p values.

`Vars` returns a vector of strings, which are the variable names that have been used in the regression model.

### Author(s)

Author: Predictive Analytics Team at Pivotal Inc.

Maintainer: Frank McQuillan, Pivotal Inc. fmcquillan@pivotal.io

### References

 Stata 13 help for margins, https://www.stata.com/help.cgi?margins

`relevel` changes the reference category.

`madlib.lm`, `madlib.glm` compute linear and logistic regressions.

### Examples

```## Not run:

## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname)

## create a data table in database and the R wrapper
delete("abalone", conn.id = cid)
dat <- as.db.data.frame(abalone, "abalone", conn.id = cid)

fit <- madlib.lm(rings ~ length + diameter*sex, data = dat)
margins(fit)
margins(fit, at.mean = TRUE)
margins(fit, factor.continuous = TRUE)
margins(fit, dydx = ~ Vars(model) + Terms())

fit <- madlib.glm(rings < 10 ~ length + diameter*sex, data = dat, family = "logistic")
margins(fit, ~ length + sex)
margins(fit, ~ length + sex.M, at.mean = TRUE)
margins(fit, ~ length + sex.I, factor.continuous = TRUE)
margins(fit, ~ Vars(model) + Terms())

## create a data table that has two columns
## one of them is an array column
dat1 <- cbind(db.array(dat[,-c(1,2,10)]), dat[,10])
names(dat1) <- c("x", "y")
delete("abalone_array", conn.id = cid)
dat1 <- as.db.data.frame(dat1, "abalone_array")

fit <- madlib.glm(y < 10 ~ x[-1], data = dat1, family = "logistic")
margins(fit, ~ x[2:5])

db.disconnect(cid, verbose = FALSE)

## End(Not run)
```

[Package PivotalR version 0.1.18.5 Index]