cv_gspcr {gspcr}R Documentation

Cross-validation of Generalized Principal Component Regression

Description

Use K-fold cross-validation to decide on the number of principal components and the threshold value for GSPCR.

Usage

cv_gspcr(
  dv,
  ivs,
  fam = c("gaussian", "binomial", "poisson", "baseline", "cumulative")[1],
  thrs = c("LLS", "PR2", "normalized")[1],
  nthrs = 10L,
  npcs_range = 1L:3L,
  K = 5,
  fit_measure = c("F", "LRT", "AIC", "BIC", "PR2", "MSE")[1],
  max_features = ncol(ivs),
  min_features = 1,
  oneSE = TRUE,
  save_call = TRUE
)

Arguments

dv

numeric vector or factor of dependent variable values

ivs

n \times p data.frame of independent variables (factors allowed)

fam

character vector of length 1 storing the description of the error distribution and link function to be used in the model

thrs

character vector of length 1 storing the type of threshold to be used (see below for available options)

nthrs

numeric vector of length 1 storing the number of threshold values to be used

npcs_range

numeric vector defining the numbers of principal components to be used

K

numeric vector of length 1 storing the number of folds for the K-fold cross-validation procedure

fit_measure

character vector of length 1 indicating the type of fit measure to be used in the cross-validation procedure

max_features

numeric vector of length 1 indicating the maximum number of features that can be selected

min_features

numeric vector of length 1 indicating the minimum number of features that should be selected

oneSE

logical value indicating whether the results with the 1se rule should be saved

save_call

logical value indicating whether the call should be saved and returned in the results

Details

The variables in ivs do not need to be standardized beforehand as the function handles scaling appropriately based on the measurement levels of the data.

The fam argument is used to define which model will be used when regressing the dependent variable on the principal components:

The thrs argument defines the bivariate association-threshold measures used to determine the active set of predictors for a SPCR analysis. The following association measures are supported (measurement levels allowed reported between brackets):

The fit_measure argument defines which fit measure should be used within the cross-validation procedure. The supported measures are:

Details regarding the 1 standard error rule implemented here can be found in the documentation for the function cv_choose().

Value

Object of class gspcr, which is a list containing:

Author(s)

Edoardo Costantini, 2023

References

Bair, E., Hastie, T., Paul, D., & Tibshirani, R. (2006). Prediction by supervised principal components. Journal of the American Statistical Association, 101(473), 119-137.

Examples

# Example input values
dv <- mtcars[, 1]
ivs <- mtcars[, -1]
thrs <- "PR2"
nthrs <- 5
fam <- "gaussian"
npcs_range <- 1:3
K <- 3
fit_measure <- "F"
max_features <- ncol(ivs)
min_features <- 1
oneSE <- TRUE
save_call <- TRUE

# Example usage
out_cont <- cv_gspcr(
  dv = GSPCRexdata$y$cont,
  ivs = GSPCRexdata$X$cont,
  fam = "gaussian",
  nthrs = 5,
  npcs_range = 1:3,
  K = 3,
  fit_measure = "F",
  thrs = "normalized",
  min_features = 1,
  max_features = ncol(GSPCRexdata$X$cont),
  oneSE = TRUE,
  save_call = TRUE
)


[Package gspcr version 0.9.5 Index]