boss {BOSSreg}R Documentation

Best Orthogonalized Subset Selection (BOSS).

Description

Usage

boss(
  x,
  y,
  maxstep = min(nrow(x) - intercept - 1, ncol(x)),
  intercept = TRUE,
  hdf.ic.boss = TRUE,
  mu = NULL,
  sigma = NULL,
  ...
)

Arguments

x

A matrix of predictors, with nrow(x)=length(y)=n observations and ncol(x)=p predictors. Intercept shall NOT be included.

y

A vector of response variable, with length(y)=n.

maxstep

Maximum number of steps performed. Default is min(n-1,p) if intercept=FALSE, and it is min(n-2, p) otherwise.

intercept

Logical, whether to include an intercept term. Default is TRUE.

hdf.ic.boss

Logical, whether to calculate the heuristic degrees of freedom (hdf) and information criteria (IC) for BOSS. IC includes AIC, BIC, AICc, BICc, GCV, Cp. Default is TRUE.

mu

True mean vector, used in the calculation of hdf. Default is NULL, and is estimated via least-squares (LS) regression of y upon x for n>p, and 10-fold CV cross-validated lasso estimate for n<=p.

sigma

True standard deviation of the error, used in the calculation of hdf. Default is NULL, and is estimated via least-squares (LS) regression of y upon x for n>p, and 10-fold cross-validated lasso for n<=p.

...

Extra parameters to allow flexibility. Currently none allows or requires, just for the convinience of call from other parent functions like cv.boss.

Details

This function computes the full solution path given by BOSS and FS on a given dataset (x,y) with n observations and p predictors. It also calculates the heuristic degrees of freedom for BOSS, and various information criteria, which can further be used to select the subset from the candidates. Please refer to the Vignette for implementation details and Tian et al. (2021) for methodology details (links are given below).

Value

Author(s)

Sen Tian

References

See Also

predict and coef methods for "boss" object, and the cv.boss function

Examples

## Generate a trivial dataset, X has mean 0 and norm 1, y has mean 0
set.seed(11)
n = 20
p = 5
x = matrix(rnorm(n*p), nrow=n, ncol=p)
x = scale(x, center = colMeans(x))
x = scale(x, scale = sqrt(colSums(x^2)))
beta = c(1, 1, 0, 0, 0)
y = x%*%beta + scale(rnorm(n, sd=0.01), center = TRUE, scale = FALSE)

## Fit the model
boss_result = boss(x, y)

## Get the coefficient vector selected by AICc-hdf (S3 method for boss)
beta_boss_aicc = coef(boss_result)
# the above is equivalent to the following
beta_boss_aicc = boss_result$beta_boss[, which.min(boss_result$IC_boss$aicc), drop=FALSE]
## Get the fitted values of BOSS-AICc-hdf (S3 method for boss)
mu_boss_aicc = predict(boss_result, newx=x)
# the above is equivalent to the following
mu_boss_aicc = cbind(1,x) %*% beta_boss_aicc

## Repeat the above process, but using Cp-hdf instead of AICc-hdf
## coefficient vector
beta_boss_cp = coef(boss_result, method.boss='cp')
beta_boss_cp = boss_result$beta_boss[, which.min(boss_result$IC_boss$cp), drop=FALSE]
## fitted values of BOSS-Cp-hdf
mu_boss_cp = predict(boss_result, newx=x, method.boss='cp')
mu_boss_cp = cbind(1,x) %*% beta_boss_cp

[Package BOSSreg version 0.2.0 Index]