test_spectra {waves} | R Documentation |
Test the performance of spectral models
Description
Wrapper that trains models based spectral data to predict reference values and reports model performance statistics
Usage
test_spectra(
train.data,
num.iterations,
test.data = NULL,
pretreatment = 1,
k.folds = 5,
proportion.train = 0.7,
tune.length = 50,
model.method = "pls",
best.model.metric = "RMSE",
stratified.sampling = TRUE,
cv.scheme = NULL,
trial1 = NULL,
trial2 = NULL,
trial3 = NULL,
split.test = FALSE,
seed = 1,
verbose = TRUE,
wavelengths = deprecated(),
preprocessing = deprecated(),
output.summary = deprecated(),
rf.variable.importance = deprecated()
)
Arguments
train.data |
|
num.iterations |
Number of training iterations to perform |
test.data |
|
pretreatment |
Number or list of numbers 1:13 corresponding to desired pretreatment method(s):
|
k.folds |
Number indicating the number of folds for k-fold cross-validation during model training. Default is 5. |
proportion.train |
Fraction of samples to include in the training set. Default is 0.7. |
tune.length |
Number delineating search space for tuning of the PLSR
hyperparameter |
model.method |
Model type to use for training. Valid options include:
|
best.model.metric |
Metric used to decide which model is best. Must be either "RMSE" or "Rsquared" |
stratified.sampling |
If |
cv.scheme |
A cross validation (CV) scheme from Jarquín et al., 2017.
Options for
|
trial1 |
|
trial2 |
|
trial3 |
|
split.test |
boolean that allows for a fixed training set and a split
test set. Example// train model on data from two breeding programs and a
stratified subset (70%) of a third and test on the remaining samples
(30%) of the third. If |
seed |
Integer to be used internally as input for |
verbose |
If |
wavelengths |
DEPRECATED |
preprocessing |
DEPRECATED please use
|
output.summary |
DEPRECATED |
rf.variable.importance |
DEPRECATED
|
Details
Calls pretreat_spectra
, format_cv
,
and train_spectra
functions.
Value
list
of 5 objects:
'model.list' is a
list
of trained model objects, one for each pretreatment method specified by thepretreatment
argument. Each model is trained with all rows ofdf
.'summary.model.performance' is a
data.frame
containing summary statistics across all model training iterations and pretreatments. See below for a description of the summary statistics provided.'model.performance' is a
data.frame
containing performance statistics for each iteration of model training separately (see below).'predictions' is a
data.frame
containing both reference and predicted values for each test set entry in each iteration of model training.'importance' is a
data.frame
containing variable importance results for each wavelength at each iteration of model training. Ifmodel.method
is not "pls" or "rf", this list item isNULL
.
'summary.model.performance' and 'model.performance' data.frames
summary statistics include:
Tuned parameters depending on the model algorithm:
-
Best.n.comp, the best number of components
-
Best.ntree, the best number of trees in an RF model
-
Best.mtry, the best number of variables to include at every decision point in an RF model
-
-
RMSECV, the root mean squared error of cross-validation
-
R2cv, the coefficient of multiple determination of cross-validation for PLSR models
-
RMSEP, the root mean squared error of prediction
-
R2p, the squared Pearson’s correlation between predicted and observed test set values
-
RPD, the ratio of standard deviation of observed test set values to RMSEP
-
RPIQ, the ratio of performance to interquartile difference
-
CCC, the concordance correlation coefficient
-
Bias, the average difference between the predicted and observed values
-
SEP, the standard error of prediction
-
R2sp, the squared Spearman’s rank correlation between predicted and observed test set values
Author(s)
Jenna Hershberger jmh579@cornell.edu
Examples
library(magrittr)
ikeogu.2017 %>%
dplyr::rename(reference = DMC.oven,
unique.id = sample.id) %>%
dplyr::select(unique.id, reference, dplyr::starts_with("X")) %>%
na.omit() %>%
test_spectra(
train.data = .,
tune.length = 3,
num.iterations = 3,
pretreatment = 1
)