plotEffects {diversityForest}R Documentation

Interaction forest plots: exploring interaction forest results through visualisation

Description

This function allows to visualise the (estimated) bivariable influences of pairs of variables (with large quantitative and qualitative EIM values) on the outcome. This step is crucial, because to interpret interaction effects between variable pairs with large quantitative and qualitative EIM values, it is necessary to learn about the forms these interaction effects take.

Usage

plotEffects(
  intobj,
  type = "quant",
  numpairs = 5,
  indpairs = NULL,
  pairs = NULL,
  allwith = NULL,
  pvalues = TRUE,
  twoplots = TRUE,
  addtitles = TRUE,
  plotit = TRUE
)

Arguments

intobj

Object of class interactionfor.

type

This can be either "quant" or "qual" and determines whether the plotted pairs are sorted according to either the quantitative or qualitative EIM values in decreasing order. Default is "quant".

numpairs

The number of pairs to plot (default: 5). This is overwritten by indpairs.

indpairs

Optional. The indices of the pairs in the sorted lists of quantitative (type="quant") or qualitative EIM values to plot (type="qual"). This overwrites the numpairs argument.

pairs

This can be used to specify the pairs to plot. It is an optional list of character string vectors, where each of these vectors has length two. Each list element corresponds to one pair, where the first character string gives the name of the first member of the respective pair to plot and the second character string gives the name of the second member. This argument overwrites numpairs and indpairs.

allwith

This is an optional character string that can be set to the name of one of the variables. If provided, only variable pairs will be considered that feature the variable specified by this argument allwith. These pairs are again sorted in decreasing order according to the quantitative (type="quant") or qualitative (type="qual") EIM values and their number is restricted to the value given by numpairs. This argument allwith can be used, if it is of interest to learn whether a specific variable (e.g., sex or age) interacts with other variables in the data set and if so, which forms these interactions take.

pvalues

Set to TRUE (default) to add to the plots p-values from tests for interaction effect obtained using classical parametric regression approaches. For categorical outcomes logistic regression is used, for metric outcomes linear regression and for survival outcomes Cox regression. NOTE: These p-values are generally much too optimistic and MUST NOT be reported as the result of a statistical test for significance of interaction. See the 'Details' section below for further details.

twoplots

Set to TRUE / FALSE if for each plot page the results of two / one pair(s) of variables should be shown. Default is TRUE.

addtitles

Set to TRUE (default) to add headings providing the names of the variables in each pair. If type="quant", these headings also give information on the type of quantitative interaction effect. Setting addtitles to FALSE is, for example, useful, when the produced plots are intended for use in a publication, where these headings might not be desirable.

plotit

This states whether the plots are actually plotted or merely returned as ggplot objects. Default is TRUE.

Details

For each considered pair the bivariable influence of both pair members on the outcome estimated using a two-dimensional flexible function is shown. Such visualisations make it possible to learn about the forms of the interaction effects between variable pairs with large EIM values. Moreover, these visualisations reveal (pathological) cases in which variable pairs do not show indications of interaction effects despite featuring large EIM values.
For binary outcomes the probabilities for the second class are estimated, for categorical outcomes with more than two classes the probabilities for the largest class (i.e., the class with the most observations) are estimated (using the function plotPair, a different class can be selected instead), for metric outcomes the means of the outcome are estimated, and for survival outcomes the log hazards ratio values compared to the median effect are estimated.
The kinds of estimates shown differ also according to whether both pair members are metric or only one of the two members is metric and the other one categorical or both pair members are categorical:

As described above (function argument: pvalues), there is an option to add p-values from tests for interaction effect to the plots. If at least one of the variables in the considered variable pair is categorical and features more than two categories, there are more than one interaction terms in the regression approaches used for testing, because the categorical variables are dummy-coded. Therefore, in these cases we obtain a p-value for each interaction term. to obtain a single p-value for the test for interaction we adjust these multiple p-values using the Holm-Bonferroni procedure and take the minimum of the adjusted p-values.

NOTE: These p-values are generally much too optimistic, in particular for small data sets and large numbers of variables. The reason for this overoptimism is that these p-values are not adjusted for the fact that we already used the data to find the variable pairs with strongest indications of interaction effects. This is similar to a multiple testing problem. Therefore, these p-values should only be seen as a rough guide to be interpreted very cautiously and MUST NOT be reported as the results of a statistical test for significance of interaction. To obtain adjusted p-values that would correspond to valid tests, it would be possible to multiply these p-values by the number of possible pairs, which would correspond to Bonferroni-adjusted p-values. For example, assume that we have 30 covariate variables. In that case the number of possible pairs would be 'choose(30, 2) = 435', which is why we would need to multiply each p-value by 435 to obtain an adjusted p-value (or keep the original p-values and divide the significance level 0.05 by 435). Note, however, that Bonferroni-adjusted p-values deliver quite conservative results, that is, weaker effects might not be detected using these p-values, while, however, effects for which these p-values are small (< 0.05) are most likely relevant. Note further that these Bonferroni-adjusted p-values should be interpreted with caution because, stemming from bivariable models, these p-values do not take the multivariable nature of the data into account.

Value

A list of ggplot2 plots returned invisibly.

Author(s)

Roman Hornung

References

See Also

plot.interactionfor, plotPair

Examples

## Not run: 

## Load package:

library("diversityForest")



## Set seed to make results reproducible:

set.seed(1234)



## Construct interaction forest and calculate EIM values:

data(stock)
model <- interactionfor(dependent.variable.name = "company10", data = stock, 
                        num.trees = 20)

# NOTE: num.trees = 20 (in the above) would be much too small for practical 
# purposes. This small number of trees was simply used to keep the
# runtime of the example short.
# The default number of trees is num.trees = 20000 if EIM values are calculated
# and num.trees = 2000 otherwise.



## Obtain a first overview by applying the plot() function for
## interactionfor obects:

plot(model)



## Several possible application cases of the plotEffects() function:

# Visualise the estimated bivariable influences of the five variable pairs with the 
# largest quantitative EIM values:

plotEffects(model) # type="quant" is default.


# Visualise the estimated bivariable influences of the five pairs with the 
# largest qualitative EIM values:

plotEffects(model, type="qual")


# Visualise the estimated bivariable influences of all (eight) pairs that involve
# the variable "company7" sorted in decreasing order according to the
# qualitative EIM values:

plotEffects(model, allwith="company7", type="qual", numpairs=8)


# Visualise the estimated bivariable influences of the pairs with third and fifth
# largest qualitative EIM values:

plotEffects(model, type="qual", indpairs=c(3,5))


# Visualise the estimated bivariable influences of the pairs ("company3", "company5") and
# ("company1", "company9"):

plotEffects(model, pairs=list(c("company3", "company5"), c("company1", "company9")))



## Saving of plots generated with the plotEffects() function (e.g., for use in publications):

# Apply plotEffects() to obtain plots for the five variable pairs
# with the largest qualitative EIM values and store these plots in
# an object 'ps':

ps <- plotEffects(model, type="qual", pvalues=FALSE, twoplots=FALSE, addtitles=FALSE, plotit=FALSE)

# pvalues = FALSE states that no p-values should be shown in the plots,
# because these might not be desired in plots meant for publication.
# twoplots = FALSE ensures that we get one plot for each page instead of two plots per page.
# addtitles = FALSE removes the automatically generated titles, because these are likely
# not desired in publications.
# plotit = FALSE ensures that the plots are not displayed, but only returned (invisibly) 
# by plotEffects().


# Save the plot with second largest qualitative EIM value:

p1 <- ps[[2]]

# Add title:
library("ggpubr")
p1 <- annotate_figure(p1, top = text_grob("My descriptive plot title 1", face = "bold", size = 14))
p1

# Save as PDF:
# library("ggplot2")
# ggsave(file="mypathtofolder/FigureXY1.pdf", width=14, height=6)


# Save the plot with fifth largest qualitative EIM value:

p2 <- ps[[5]]

# Add title:
p2 <- annotate_figure(p2, top = text_grob("My descriptive plot title 2", face = "bold", size = 14))
p2

# Save as PDF:
# ggsave(file="mypathtofolder/FigureXY1.pdf", width=14, height=6)


# Combine both of the above plots:
p <- ggarrange(p1, p2, nrow = 2)
p

# Save the combined plot:
# ggsave(file="mypathtofolder/FigureXYcombined.pdf", width=14, height=11)

# NOTE: Using plotEffects() it is not possible to change the plots 
# themselves (by e.g., increasing the label sizes or changing the 
# axes ranges). However, the function plotPair() can be used to change 
# the plots themselves.


## End(Not run)


[Package diversityForest version 0.4.0 Index]