pairplot {pre} | R Documentation |
Create partial dependence plot for a pair of predictor variables in a prediction rule ensemble (pre)
Description
pairplot
creates a partial dependence plot to assess the effects of a
pair of predictor variables on the predictions of the ensemble. Note that plotting
partial dependence is computationally intensive. Computation time will increase
fast with increasing numbers of observations and variables. For large
datasets, package 'plotmo' (Milborrow, 2019) provides more efficient functions
for plotting partial dependence and also supports 'pre' models.
Usage
pairplot(
object,
varnames,
type = "both",
gamma = NULL,
penalty.par.val = "lambda.1se",
response = NULL,
nvals = c(20L, 20L),
pred.type = "response",
newdata = NULL,
xlab = NULL,
ylab = NULL,
main = NULL,
...
)
Arguments
object |
an object of class |
varnames |
character vector of length two. Currently, pairplots can only
be requested for non-nominal variables. If varnames specifies the name(s) of
variables of class |
type |
character string. Type of plot to be generated.
|
gamma |
Mixing parameter for relaxed fits. See
|
penalty.par.val |
character or numeric. Value of the penalty parameter
|
response |
numeric vector of length 1. Only relevant for multivariate gaussian
and multinomial responses. If |
nvals |
optional numeric vector of length 2. For how many values of
x1 and x2 should partial dependence be plotted? If |
pred.type |
character string. Type of prediction to be plotted on z-axis.
|
newdata |
Optional |
xlab |
character. Label to be printed on the x-axis. If |
ylab |
character. Label to be printed on the y-axis. If |
main |
Title for the plot. If |
... |
Further arguments to be passed to |
Details
Partial dependence functions are described in section 8.1 of Friedman & Popescu (2008).
By default, partial dependence will be plotted for each combination
of 20 values of the specified predictor variables. When nvals = NULL
is
specified, a dependence plot will be created for every combination of the unique
observed values of the two specified predictor variables. If NA
instead of
a numeric value is specified for one of the predictor variables, all observed
values for that variables will be used. Specifying nvals = NULL
and
nvals = c(NA, NA)
will yield the exact same result.
High values, NA
or NULL
for nvals
result in long
computation times and possibly memory problems. Also, pre
ensembles derived from training datasets that are very wide or long may
result in long computation times and/or memory allocation errors.
In such cases, reducing
the values supplied to nvals
will reduce computation time and/or
memory allocation errors.
When numeric value(s) are specified for nvals
, values for the
minimum, maximum, and nvals - 2 intermediate values of the predictor variable
will be plotted.
Alternatively, newdata
can be specified to provide a different (smaller)
set of observations to compute partial dependence over.
If mi_pre
was used to derive the original rule ensemble,
newdata = "mean.mi"
can be specified. This
will result in an average dataset being computed over the imputed datasets,
which are then used to compute partial dependence functions. This greatly
reduces the number of observations and thereby computation time.
If none of the variables specified with argument varnames
was
selected for the final prediction rule ensemble, an error will be returned.
References
Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916-954.
Milborrow, S. (2019). plotmo: Plot a model's residuals, response, and partial dependence plots. https://CRAN.R-project.org/package=plotmo
See Also
Examples
airq <- airquality[complete.cases(airquality),]
set.seed(42)
airq.ens <- pre(Ozone ~ ., data = airq)
pairplot(airq.ens, c("Temp", "Wind"))
## For multinomial and mgaussian families, one PDP is created per category or outcome
set.seed(42)
airq.ens3 <- pre(Ozone + Wind ~ ., data = airq, family = "mgaussian")
pairplot(airq.ens3, varnames = c("Day", "Month"))
set.seed(42)
iris.ens <- pre(Species ~ ., data = iris, family = "multinomial")
pairplot(iris.ens, varname = c("Petal.Width", "Petal.Length"))