R: Create partial dependence plot for a single variable in a...

singleplot {pre}

R Documentation

Create partial dependence plot for a single variable in a prediction rule ensemble (pre)

Description

singleplot creates a partial dependence plot, which shows the effect of a predictor variable on the ensemble's predictions. Note that plotting partial dependence is computationally intensive. Computation time will increase fast with increasing numbers of observations and variables. For large datasets, package 'plotmo' (Milborrow, 2019) provides more efficient functions for plotting partial dependence and also supports 'pre' models.

Usage

singleplot(
  object,
  varname,
  penalty.par.val = "lambda.1se",
  nvals = NULL,
  type = "response",
  ylab = NULL,
  response = NULL,
  gamma = NULL,
  newdata = NULL,
  xlab = NULL,
  ...
)

Arguments

`object`	an object of class `pre`.
`varname`	character vector of length one, specifying the variable for which the partial dependence plot should be created. Note that `varname` should correspond to the variable as described in the model formula used to generate the ensemble (i.e., including functions applied to the variable).
`penalty.par.val`	character or numeric. Value of the penalty parameter `\lambda` to be employed for selecting the final ensemble. The default `"lambda.min"` employs the `\lambda` value within 1 standard error of the minimum cross-validated error. Alternatively, `"lambda.min"` may be specified, to employ the `\lambda` value with minimum cross-validated error, or a numeric value `>0` may be specified, with higher values yielding a sparser ensemble. To evaluate the trade-off between accuracy and sparsity of the final ensemble, inspect `pre_object$glmnet.fit` and `plot(pre_object$glmnet.fit)`.
`nvals`	optional numeric vector of length one. For how many values of x should the partial dependence plot be created?
`type`	character string. Type of prediction to be plotted on y-axis. `type = "response"` gives fitted values for continuous outputs and fitted probabilities for nominal outputs. `type = "link"` gives fitted values for continuous outputs and linear predictor values for nominal outputs.
`ylab`	character. Label to be printed on the y-axis, defaults to the response variable name(s).
`response`	numeric vector of length 1. Only relevant for multivariate gaussian and multinomial responses. If `NULL` (default), PDPs for all response variables or categories will be produced. A single integer can be specified, indicating for which response variable or category PDPs should be produced.
`gamma`	Mixing parameter for relaxed fits. See `coef.cv.glmnet`.
`newdata`	Optional `data.frame` in which to look for variables with which to predict. If `NULL` (the default), the `data.frame` used to fit the original ensemble will be used. Smaller subsets of the original data can be specified to (substantially) reduce computation time. See Details.
`xlab`	character. Label to be printed on the x-axis. If `NULL`, the supplied `varname` will be printed on the x-axis.
`...`	Further arguments to be passed to `plot.default`.

Details

By default, a partial dependence plot will be created for each unique observed value of the specified predictor variable. See also section 8.1 of Friedman & Popescu (2008).

When the number of unique observed values is large, partial dependence functions can take a very long time to compute. Specifying the nvals argument can substantially reduce computation time. When the nvals argument is supplied, values for the minimum, maximum, and (nvals - 2) intermediate values of the predictor variable will be plotted. Note that nvals can be specified only for numeric and ordered input variables. If the plot is requested for a nominal input variable, the nvals argument will be ignored and a warning printed.

Alternatively, newdata can be specified to provide a different (smaller) set of observations to compute partial dependence over. If mi_pre was used to derive the original rule ensemble, function mean_mi can be used for this.

References

Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916-954.

Milborrow, S. (2019). plotmo: Plot a model's residuals, response, and partial dependence plots. https://CRAN.R-project.org/package=plotmo

Examples

airq <- airquality[complete.cases(airquality), ]
set.seed(42)
airq.ens <- pre(Ozone ~ ., data = airquality[complete.cases(airquality),])
singleplot(airq.ens, "Temp")

## For multinomial and mgaussian families, one PDP is created per category or outcome
set.seed(42)
airq.ens3 <- pre(Ozone + Wind ~ ., data = airq, family = "mgaussian")
singleplot(airq.ens3, varname = "Day")

set.seed(42)
iris.ens <- pre(Species ~ ., data = iris, family = "multinomial")
singleplot(iris.ens, varname = "Petal.Width")

[Package pre version 1.0.7 Index]