| test_spectra {waves} | R Documentation | 
Test the performance of spectral models
Description
Wrapper that trains models based spectral data to predict reference values and reports model performance statistics
Usage
test_spectra(
  train.data,
  num.iterations,
  test.data = NULL,
  pretreatment = 1,
  k.folds = 5,
  proportion.train = 0.7,
  tune.length = 50,
  model.method = "pls",
  best.model.metric = "RMSE",
  stratified.sampling = TRUE,
  cv.scheme = NULL,
  trial1 = NULL,
  trial2 = NULL,
  trial3 = NULL,
  split.test = FALSE,
  seed = 1,
  verbose = TRUE,
  wavelengths = deprecated(),
  preprocessing = deprecated(),
  output.summary = deprecated(),
  rf.variable.importance = deprecated()
)
Arguments
train.data | 
 
  | 
num.iterations | 
 Number of training iterations to perform  | 
test.data | 
 
  | 
pretreatment | 
 Number or list of numbers 1:13 corresponding to desired pretreatment method(s): 
  | 
k.folds | 
 Number indicating the number of folds for k-fold cross-validation during model training. Default is 5.  | 
proportion.train | 
 Fraction of samples to include in the training set. Default is 0.7.  | 
tune.length | 
 Number delineating search space for tuning of the PLSR
hyperparameter   | 
model.method | 
 Model type to use for training. Valid options include: 
  | 
best.model.metric | 
 Metric used to decide which model is best. Must be either "RMSE" or "Rsquared"  | 
stratified.sampling | 
 If   | 
cv.scheme | 
 A cross validation (CV) scheme from Jarquín et al., 2017.
Options for  
  | 
trial1 | 
 
  | 
trial2 | 
 
  | 
trial3 | 
 
  | 
split.test | 
 boolean that allows for a fixed training set and a split
test set. Example// train model on data from two breeding programs and a
stratified subset (70%) of a third and test on the remaining samples
(30%)  of the third. If   | 
seed | 
 Integer to be used internally as input for   | 
verbose | 
 If   | 
wavelengths | 
 DEPRECATED   | 
preprocessing | 
 DEPRECATED please use
  | 
output.summary | 
 DEPRECATED   | 
rf.variable.importance | 
 DEPRECATED
  | 
Details
Calls pretreat_spectra, format_cv,
and train_spectra functions.
Value
list of 5 objects:
'model.list' is a
listof trained model objects, one for each pretreatment method specified by thepretreatmentargument. Each model is trained with all rows ofdf.'summary.model.performance' is a
data.framecontaining summary statistics across all model training iterations and pretreatments. See below for a description of the summary statistics provided.'model.performance' is a
data.framecontaining performance statistics for each iteration of model training separately (see below).'predictions' is a
data.framecontaining both reference and predicted values for each test set entry in each iteration of model training.'importance' is a
data.framecontaining variable importance results for each wavelength at each iteration of model training. Ifmodel.methodis not "pls" or "rf", this list item isNULL.
'summary.model.performance' and 'model.performance' data.frames
summary statistics include:
Tuned parameters depending on the model algorithm:
-  
Best.n.comp, the best number of components
 -  
Best.ntree, the best number of trees in an RF model
 -  
Best.mtry, the best number of variables to include at every decision point in an RF model
 
-  
 -  
RMSECV, the root mean squared error of cross-validation
 -  
R2cv, the coefficient of multiple determination of cross-validation for PLSR models
 -  
RMSEP, the root mean squared error of prediction
 -  
R2p, the squared Pearson’s correlation between predicted and observed test set values
 -  
RPD, the ratio of standard deviation of observed test set values to RMSEP
 -  
RPIQ, the ratio of performance to interquartile difference
 -  
CCC, the concordance correlation coefficient
 -  
Bias, the average difference between the predicted and observed values
 -  
SEP, the standard error of prediction
 -  
R2sp, the squared Spearman’s rank correlation between predicted and observed test set values
 
Author(s)
Jenna Hershberger jmh579@cornell.edu
Examples
library(magrittr)
ikeogu.2017 %>%
  dplyr::rename(reference = DMC.oven,
                unique.id = sample.id) %>%
  dplyr::select(unique.id, reference, dplyr::starts_with("X")) %>%
  na.omit() %>%
  test_spectra(
    train.data = .,
    tune.length = 3,
    num.iterations = 3,
    pretreatment = 1
  )