asmbPLSDA.cv {asmbPLS}R Documentation

Cross-validation for asmbPLS-DA to find the best combinations of quantiles for classification

Description

Function to find the best combinations of quantiles used for classification via cross-validation. Usually should be conducted before asmbPLSDA.fit to obtain the quantile combinations.

Usage

asmbPLSDA.cv(
  X.matrix,
  Y.matrix,
  PLS.comp,
  X.dim,
  quantile.comb.table,
  outcome.type,
  method = NULL,
  measure = "B_accuracy",
  k = 5,
  ncv = 5,
  expected.measure.increase = 0.005,
  center = TRUE,
  scale = TRUE,
  maxiter = 100
)

Arguments

X.matrix

Predictors matrix. Samples in rows, variables in columns.

Y.matrix

Outcome matrix. Samples in rows, this is a matrix with one column (binary) or multiple columns (more than 2 levels, dummy variables).

PLS.comp

Number of PLS components in asmbPLS-DA.

X.dim

A vector containing the number of predictors in each block (ordered).

quantile.comb.table

A matrix containing user-defined quantile combinations used for CV, whose column number equals to the number of blocks.

outcome.type

The type of the outcome Y. "binary" for binary outcome, and "multiclass" for categorical outcome with more than 2 levels.

method

Decision rule used for CV. For binary outcome, the methods include "fixed_cutoff", "Euclidean_distance_X" and "Mahalanobis_distance_X". For categorical outcome with more than 2 levels, the methods include "Max_Y", "Euclidean_distance_X", "Mahalanobis_distance_X", "Euclidean_distance_Y", and "PCA_Mahalanobis_distance_Y".

measure

Five measures are available: overall accuracy accuracy, balanced accuracy B_accuracy, precision precision, recall recall, F1 score F1.

k

The number of folds of CV procedure. The default is 5.

ncv

The number of repetitions of CV. The default is 5.

expected.measure.increase

The measure you expect to increase after including one more PLS component, which will affect the selection of optimal PLS components. The default is 0.005.

center

A logical value indicating whether weighted mean center should be implemented for X.matrix and Y.matrix. The default is TRUE.

scale

A logical value indicating whether scale should be implemented for X.matrix. The default is TRUE.

maxiter

A integer indicating the maximum number of iteration. The default number is 100.

Value

asmbPLSDA.cv returns a list containing the following components:

quantile_table_CV

A matrix containing the selected quantile combination and the corresponding measures of CV for each PLS component.

optimal_nPLS

Optimal number of PLS components.

Examples

## Use the example dataset
data(asmbPLSDA.example)
X.matrix = asmbPLSDA.example$X.matrix
Y.matrix.binary = asmbPLSDA.example$Y.matrix.binary
Y.matrix.multiclass = asmbPLSDA.example$Y.matrix.morethan2levels
X.dim = asmbPLSDA.example$X.dim
PLS.comp = asmbPLSDA.example$PLS.comp
quantile.comb.table.cv = asmbPLSDA.example$quantile.comb.table.cv

## cv to find the best quantile combinations for model fitting (binary outcome)
cv.results.binary <- asmbPLSDA.cv(X.matrix = X.matrix, 
                                  Y.matrix = Y.matrix.binary, 
                                  PLS.comp = PLS.comp, 
                                  X.dim = X.dim, 
                                  quantile.comb.table = quantile.comb.table.cv, 
                                  outcome.type = "binary",
                                  k = 3,
                                  ncv = 3)
quantile.comb.binary <- cv.results.binary$quantile_table_CV[,1:length(X.dim)]
n.PLS.binary <- cv.results.binary$optimal_nPLS

## asmbPLSDA fit using the selected quantile combination (binary outcome)
asmbPLSDA.fit.binary <- asmbPLSDA.fit(X.matrix = X.matrix, 
                                      Y.matrix = Y.matrix.binary, 
                                      PLS.comp = n.PLS.binary, 
                                      X.dim = X.dim, 
                                      quantile.comb = quantile.comb.binary,
                                      outcome.type = "binary")


## cv to find the best quantile combinations for model fitting 
## (categorical outcome with more than 2 levels)
cv.results.multiclass <- asmbPLSDA.cv(X.matrix = X.matrix, 
                                      Y.matrix = Y.matrix.multiclass, 
                                      PLS.comp = PLS.comp, 
                                      X.dim = X.dim, 
                                      quantile.comb.table = quantile.comb.table.cv, 
                                      outcome.type = "multiclass",
                                      k = 3,
                                      ncv = 2)
quantile.comb.multiclass <- cv.results.multiclass$quantile_table_CV[,1:length(X.dim)]
n.PLS.multiclass <- cv.results.multiclass$optimal_nPLS

## asmbPLSDA fit (categorical outcome with more than 2 levels)
asmbPLSDA.fit.multiclass <- asmbPLSDA.fit(X.matrix = X.matrix, 
                                          Y.matrix = Y.matrix.multiclass, 
                                          PLS.comp = n.PLS.multiclass, 
                                          X.dim = X.dim, 
                                          quantile.comb = quantile.comb.multiclass,
                                          outcome.type = "multiclass")


[Package asmbPLS version 1.0.0 Index]