R: Plot method for the chemmodlab class.

plot.chemmodlab {chemmodlab}

R Documentation

Plot method for the chemmodlab class.

Description

plot.chemmodlab takes a chemmodlab object output by the ModelTrain function and creates a series of accumulation curve plots for assesing model and descriptor set performance.

Usage

## S3 method for class 'chemmodlab'
plot(
  x,
  max.select = NA,
  splits = 1:x$nsplits,
  meths = x$models,
  series = "both",
  ...
)

Arguments

`x`	an object of class `chemmodlab`.
`max.select`	the maximum number of tests to plot for the accumulation curve. If `max.select` is not specified, use `floor(min(300,n/4))`, where `n` is the number of compounds.
`splits`	a numeric vector containing the indices of the splits to use to construct accumulation curves. Default is to use all splits. `NA` means the first series of plots are not generated. See `Details`.
`meths`	a character vector with statistical methods implemented in `chemmodlab`. The statistical methods to use for the second series of plots. This argument can take the same values as argument `models` in function `ModelTrain`. See `Details`.
`series`	a character vector. Which series of plots to construct. Can be one of `"descriptors"`, `"methods"`, `"both"`.
`...`	other parameters to be passed through to plotting functions.

Details

For a binary response, the accumulation curve plots the number of assay hits identified as a function of the number of tests conducted, where testing order is determined by the predicted probability of a response being positive obtained from k-fold cross validation. Given a particular compound collection, larger accumulations are preferable.

The accumulation curve has also been extended to continuous responses. Assuming large positive values of a continuous response y are preferable, chemmodlab accumulates y so that \sum y_i is the sum of the y over the first n tests. This extension includes the binary-response accumulation curve as a special case.

By default, we display accumulation curves up to 300 tests, not for the entire collection, to focus on the goal of finding actives as early as possible.

There are two main series of plots generated:

Methods plot series

There is one plot per CV split and descriptor set combination. The accumulation curves for each modeling method is compared.

Descriptors plot series

There is one plot per CV split and model fit. The accumulation curves for each descriptor set is compared.

Author(s)

Jacqueline Hughes-Oliver, Jeremy Ash, Atina Brooks

References

Modified from code originally written by William J. Welch 2001-2002

Examples

## Not run: 
# A data set with  binary response and multiple descriptor sets
data(aid364)

cml <- ModelTrain(aid364, ids = TRUE, xcol.lengths = c(24, 147), 
                  des.names = c("BurdenNumbers", "Pharmacophores"))
plot(cml)

## End(Not run)

# A continuous response
cml <- ModelTrain(USArrests, nsplits = 2, nfolds = 2,
                  models = c("KNN", "Lasso", "Tree"))
plot(cml)

[Package chemmodlab version 2.0.0 Index]