R: Cross validation for a hierarchical feature regression

cv.hfr {hfr}

R Documentation

Cross validation for a hierarchical feature regression

Description

HFR is a regularized regression estimator that decomposes a least squares regression along a supervised hierarchical graph, and shrinks the edges of the estimated graph to regularize parameters. The algorithm leads to group shrinkage in the regression parameters and a reduction in the effective model degrees of freedom.

Usage

cv.hfr(
  x,
  y,
  weights = NULL,
  kappa = seq(0, 1, by = 0.1),
  q = NULL,
  intercept = TRUE,
  standardize = TRUE,
  nfolds = 10,
  foldid = NULL,
  partial_method = c("pairwise", "shrinkage"),
  l2_penalty = 0,
  ...
)

Arguments

`x`	Input matrix or data.frame, of dimension `(N\times p)`; each row is an observation vector.
`y`	Response variable.
`weights`	an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If non-NULL, weighted least squares is used for the level-specific regressions.
`kappa`	A vector of target effective degrees of freedom of the regression.
`q`	Thinning parameter representing the quantile cut-off (in terms of contributed variance) above which to consider levels in the hierarchy. This can used to reduce the number of levels in high-dimensional problems. Default is no thinning.
`intercept`	Should intercept be fitted. Default is `intercept=TRUE`.
`standardize`	Logical flag for `x` variable standardization prior to fitting the model. The coefficients are always returned on the original scale. Default is `standardize=TRUE`.
`nfolds`	The number of folds for k-fold cross validation. Default is `nfolds=10`.
`foldid`	An optional vector of values between `1` and `nfolds` identifying what fold each observation is in. If supplied, `nfolds` can be missing.
`partial_method`	Indicate whether to use pairwise partial correlations, or shrinkage partial correlations.
`l2_penalty`	Optional penalty for level-specific regressions (useful in high-dimensional case)
`...`	Additional arguments passed to `hclust`.

Details

This function fits an HFR to a grid of kappa hyperparameter values. The result is a matrix of coefficients with one column for each hyperparameter. By evaluating all hyperparameters in a single function, the speed of the cross-validation procedure is improved substantially (since level-specific regressions are estimated only once).

When nfolds > 1, a cross validation is performed with shuffled data. Alternatively, test slices can be passed to the function using the foldid argument. The result of the cross validation is given by best_kappa in the output object.

Value

A 'cv.hfr' regression object.

Author(s)

Johann Pfitzinger

References

Pfitzinger, Johann (2024). Cluster Regularization via a Hierarchical Feature Regression. _Econometrics and Statistics_ (in press). URL https://doi.org/10.1016/j.ecosta.2024.01.003.

Examples

x = matrix(rnorm(100 * 20), 100, 20)
y = rnorm(100)
fit = cv.hfr(x, y, kappa = seq(0, 1, by = 0.1))
coef(fit)

[Package hfr version 0.7.1 Index]