plotModelSeries {pedometrics} | R Documentation |
Model series plot
Description
Produce a graphical output to examine the effect of using different model specifications (design)
on the predictive performance of these models (a model series). Devised to access the results of
buildModelSeries()
and statsMS()
, but can be easily adapted to
work with any model structure and performance measure.
Usage
plotModelSeries(
obj,
grid,
line,
ind,
type = c("b", "g"),
pch = c(20, 2),
size = 0.5,
arrange = "desc",
color = NULL,
xlim = NULL,
ylab = NULL,
xlab = NULL,
at = NULL,
...
)
plotMS(
obj,
grid,
line,
ind,
type = c("b", "g"),
pch = c(20, 2),
size = 0.5,
arrange = "desc",
color = NULL,
xlim = NULL,
ylab = NULL,
xlab = NULL,
at = NULL,
...
)
Arguments
obj |
Object of class
See ‘Details’ for more information. |
grid |
Vector of integer values or character strings indicating the columns of the
|
line |
Character string or integer value indicating which of the performance statistics
(usually calculated by |
ind |
Integer value indicating for which group of models the mean rank is to be calculated. See ‘Details’ for more information. |
type |
Vector of character strings indicating some of the effects to be used when plotting
the performance statistics using |
pch |
Vector with two integer values specifying the symbols to be used to plot points. The
first sets the symbol used to plot the performance statistic, while the second sets the symbol
used to plot the mean rank of the indicator set using argument |
size |
Numeric value specifying the size of the symbols used for plotting the mean rank of
the indicator set using argument |
arrange |
Character string indicating how the model series should be arranged, which can be
in ascending ( |
color |
Vector defining the colors to be used in the grid produced by function
|
xlim |
Numeric vector of length 2, giving the x coordinates range. If |
ylab |
Character vector of length 2, giving the y-axis labels. When |
xlab |
Character vector of unit length, the x-axis label. Defaults |
at |
Numeric vector indicating the location of tick marks along the x axis (in native coordinates). |
... |
Other arguments for plotting, although most of these have no been tested. Argument
|
Details
This section gives more details about arguments obj
, grid
, line
, arrange
, and ind
.
obj
The argument obj
usually constitutes a data.frame
returned by statsMS()
.
However, the user can use any data.frame
object as far as it contains the two basic units of
information needed:
design data passed with argument
grid
performance statistic passed with argument
line
grid
The argument grid
indicates the design data which is used to produce the grid output in the
top of the model series plot. By design we mean the data that specify the structure of each
model and how they differ from each other. Suppose that eight linear models were fit using three
types of predictor variables (a
, b
, and c
). Each of these predictor variables is available
in two versions that differ by their accuracy, where 0
means a less accurate predictor
variable, while 1
means a more accurate predictor variable. This yields 2^3 = 8 total possible
combinations. The design data would be of the following form:
> design
a b c
1 0 0 0
2 0 0 1
3 0 1 0
4 1 0 0
5 0 1 1
6 1 0 1
7 1 1 0
8 1 1 1
line
The argument line
corresponds to the performance statistic that is used to arrange the models
in ascending or descending order, and to produce the line output in the bottom of the model
series plot. For example, it can be a series of values of adjusted coefficient of determination,
one for each model:
adj_r2 <- c(0.87, 0.74, 0.81, 0.85, 0.54, 0.86, 0.90, 0.89)
arrange
The argument arrange
automatically arranges the model series according to the performance
statistics selected with argument line
. If obj
is a data.frame
returned by
statsMS()
, then the function uses standard arranging approaches. For most
performance statistics, the models are arranged in descending order. The exception is when
"r2"
, "adj_r2"
, or "ADJ_r2"
are used, in which case the models are arranged in ascending
order. This means that the model with lowest value appears in the leftmost side of the model
series plot, while the models with the highest value appears in the rightmost side of the plot.
> arrange(obj, adj_r2)
id a b c adj_r2
1 5 1 0 1 0.54
2 2 0 0 1 0.74
3 3 1 0 0 0.81
4 4 0 1 0 0.85
5 6 0 1 1 0.86
6 1 0 0 0 0.87
7 8 1 1 1 0.89
8 7 1 1 0 0.90
This results suggest that the best performing model is that of id = 7
, while the model of
id = 5
is the poorest one.
ind
The model series plot allows to see how the design influences model performance. This is achieved mainly through the use of different colors in the grid output, where each unique value in the design data is represented by a different color. For the example given above, one could try to see if the models built with the more accurate versions of the predictor variables have a better performance by identifying their relative distribution in the model series plot. The models placed at the rightmost side of the plot are those with the best performance.
The argument ind
provides another tool to help identifying how the design, more specifically
how each variable in the design data, influences model performance. This is done by simply
calculating the mean ranking of the models that were built using the updated version of each
predictor variable. This very same mean ranking is also used to rank the predictor variables and
thus identify which of them is the most important.
After arranging the design
data described above using the adjusted coefficient of
determination, the following mean rank is obtained for each predictor variable:
> rank_center
a b c
1 5.75 6.25 5.25
This result suggests that the best model performance is obtained when using the updated version
of the predictor variable b
. In the model series plot, the predictor variable b
appears in
the top row, while the predictor variable c
appears in the bottom row.
Value
An object of class "trellis"
consisting of a model series plot.
Dependencies
The grDevices package, provider of graphics devices and support for colours and fonts in R,
is required for plotModelSeries()
to work.
The grid package, a rewrite of the graphics layout capabilities in R, is required for
plotModelSeries()
to work.
Warning
Use the original functions lattice::xyplot()
and lattice::levelplot()
for higher
customization.
Note
Some of the solutions used to build this function were found in the source code of the R-package mvtsplot. As such, the author of that package, Roger D. Peng rpeng@jhsph.edu, is entitled ‘contributors’ to the R-package pedometrics.
Author(s)
Alessandro Samuel-Rosa alessandrosamuelrosa@gmail.com
References
Deepayan Sarkar (2008). Lattice: Multivariate Data Visualization with R. Springer, New York. ISBN 978-0-387-75968-5.
Roger D. Peng (2008). A method for visualizing multivariate time series data. Journal of Statistical Software. v. 25 (Code Snippet), p. 1-17.
Roger D. Peng (2012). mvtsplot: Multivariate Time Series Plot. R package version 1.0-1. https://CRAN.R-project.org/package=mvtsplot.
A. Samuel-Rosa, G. B. M. Heuvelink, G. de Mattos Vasques, and L. H. C. dos Anjos, Do more detailed environmental covariates deliver more accurate soil maps?, Geoderma, vol. 243–244, pp. 214–227, May 2015, doi: 10.1016/j.geoderma.2014.12.017.
See Also
lattice::xyplot()
lattice::levelplot()
Examples
if (all(require(grDevices), require(grid))) {
# This example follows the discussion in section "Details"
# Note that the data.frame is created manually
id <- c(1:8)
design <- data.frame(a = c(0, 0, 1, 0, 1, 0, 1, 1),
b = c(0, 0, 0, 1, 0, 1, 1, 1),
c = c(0, 1, 0, 0, 1, 1, 0, 1))
adj_r2 <- c(0.87, 0.74, 0.81, 0.85, 0.54, 0.86, 0.90, 0.89)
obj <- cbind(id, design, adj_r2)
p <- plotModelSeries(obj, grid = c(2:4), line = "adj_r2", ind = 1,
color = c("lightyellow", "palegreen"),
main = "Model Series Plot")
}