plot_effects {spaMM} | R Documentation |
Partial-dependence effects and plots
Description
The function pdep_effects
evaluates, and the function plot_effects
plots, partial-dependence effects.
pdep_effects
evaluates the effect of a given fixed-effect variable, as (by default, the average of) predicted values on the response scale, over the empirical distribution of all other fixed-effect variables in the data, and of inferred random effects. This can be seen as the result of an experiment where specific treatments (given values of the focal variable) are applied over all conditions defined by the other fixed effects and by the inferred random effects. Thus, apparent dependencies induced by associations between predictor variables are avoided (see Friedman, 2001, from which the name “partial dependence plot” is taken; or Hastie et al., 2009, Section 10.13.2). This also avoids biases of possible alternative ways of plotting effects. In particular, such biases occur if the response link is not identity, and if averaging is performed on the linear-predictor scale or when other variables are set to some conventional value other than its average.
pdep_effects
also compute intervals of the type defined by its intervals
argument (by default, prediction intervals) and of nominal coverage defined by the level
argument (it may make particular sense to choose a level
<0.95 to better visualize effects). By default, it returns a data frame of average values of point predictions and interval bounds for each value of the focal variable (so the intervals may briefly be described as mean prediction intervals, for want of better), but it can also return lists of all predictions.
A plot function is available for numeric or factor predictors: plot_effects
calls pdep_effects
and produces a simple plot (using only base graphic functions) of its results, including prediction bands representing the two average one-sided widths of intervals. The last section of the Examples shows how to obtain more elaborate plots including the same information using ggplot2.
If added to the plot, the raw data may appear to depart from the partial-dependence predictions, since the data are a priori affected by the associations between variables which the predictions free themselves from. An adapted plot of fit residuals may be then be more useful, and the Examples also show how it can be performed.
Usage
pdep_effects(object, focal_var, newdata = object$data, length.out = 20,
focal_values=NULL, level=0.95, levels = NULL,
intervals = "predVar", indiv = FALSE, ...)
plot_effects(object, focal_var, newdata = object$data, focal_values=NULL,
effects = NULL, xlab = focal_var, ylab = NULL,
rgb.args = col2rgb("blue"), add = FALSE, ylim=NULL, ...)
Arguments
object |
An object of class |
focal_var |
Character string: the name of the predictor variable whose effect is to be represented. The variable must be numeric for |
newdata |
If non-NULL, a data frame passed to |
effects |
If non-NULL, a data frame to substitute to the one produced by default by |
xlab |
If non-NULL, a character string: X-axis label for the plot. |
ylab |
If non-NULL, a character string: Y-axis label for the plot. |
ylim |
The |
rgb.args |
Color control arguments, in the format produced by |
add |
Boolean: whether to add graphic elements of a previous plot produced by |
length.out |
Integer: for a numeric predictor variable, this controls the number of values at which predictions are evaluated. By default, predictions are made at regular intervals over the range of the predictor variable. If |
intervals , level |
Passed to |
focal_values , levels |
|
indiv |
Boolean: whether to return all predictions given the values of other predictors in the |
... |
Further arguments passed by |
Value
For pdep_effects
, a nested list, or a data frame storing values of the focal_var
, average point predictions pointp
and bounds low
and up
of intervals, depending on the indiv
argument. When indiv
is TRUE
, each sublist contains vectors for pointp
, low
and up
.
For plot_effects
, the same value, returned invisibly.
References
J.H. Friedman (2001). Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics 29(5):1189-1232.
J. Friedman, T. Hastie and R. Tibshirani (2009) The Elements of Statistical Learning, 2nd ed. Springer.
Examples
data("scotlip")
hlcor <- HLCor(cases~I(prop.ag/10) +adjacency(1|gridcode)+offset(log(expec)),
adjMatrix=Nmatrix,family=poisson(),data=scotlip)
plot_effects(hlcor,focal_var="prop.ag",ylim=c(0,max(scotlip$cases)))
points(cases~prop.ag, data=scotlip, col="blue",pch=20)
# Impose specific values of a numeric predictor using 'focal_values':
plot_effects(hlcor, focal_var="prop.ag", focal_values=1:5)
### Adding 'partial residuals' [residuals relative to predict(<fit object>),
### but plotted relative to pdep_effects() predictions]:
# One first needs predictions for actual values of the predictor variable,
# provided by pdep_effects(.,length.out=0L):
#
pdep_points <- pdep_effects(hlcor,focal_var="prop.ag",length.out=0L)
# Rename for easy prediction for each observation, and add the residuals
# of the actual fit, using the default residuals() i.e. deviance ones:
#
rownames(pdep_points) <- pdep_points$focal_var
pdep_res <- pdep_points[paste(hlcor$data$prop.ag),"pointp"] +
residuals(hlcor)
points(x = hlcor$data$prop.ag, y = pdep_res, col = "red", pch = 20)
## Not run:
## Plotting pdep-effects for different categories, using ggplot.
library(ggplot2)
data("Gryphon")
tmp <- na.omit(Gryphon_df)
spfit <- spaMM::fitme(TARSUS ~ BWT*sex, data = tmp)
tmp$sex <- "1"
pdep_1 <- pdep_effects(spfit,"BWT", newdata=tmp, level=qnorm(0.75))
# qnorm(0.75) to get the so-called 'probable error'.
tmp$sex <- "2"
pdep_2 <- pdep_effects(spfit,"BWT", newdata=tmp, level=qnorm(0.75))
pdep_1$sex <- "1" ; pdep_2$sex <- "2"
pdep <- rbind(pdep_1,pdep_2)
ggplot(pdep,aes(y = pointp , x = focal_var ,col = sex, fill=sex)) + geom_point() +
geom_ribbon(aes(ymin = low, ymax = up), alpha = 0.3) + xlab("BWT") +
ylab("TARSUS")
## End(Not run)