hfr {hfr} | R Documentation |
Fit a hierarchical feature regression
Description
HFR is a regularized regression estimator that decomposes a least squares regression along a supervised hierarchical graph, and shrinks the edges of the estimated graph to regularize parameters. The algorithm leads to group shrinkage in the regression parameters and a reduction in the effective model degrees of freedom.
Usage
hfr(
x,
y,
weights = NULL,
kappa = 1,
q = NULL,
intercept = TRUE,
standardize = TRUE,
partial_method = c("pairwise", "shrinkage"),
l2_penalty = 0,
...
)
Arguments
x |
Input matrix or data.frame, of dimension |
y |
Response variable. |
weights |
an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If non-NULL, weighted least squares is used for the level-specific regressions. |
kappa |
The target effective degrees of freedom of the regression as a percentage of |
q |
Thinning parameter representing the quantile cut-off (in terms of contributed variance) above which to consider levels in the hierarchy. This can used to reduce the number of levels in high-dimensional problems. Default is no thinning. |
intercept |
Should intercept be fitted. Default is |
standardize |
Logical flag for x variable standardization prior to fitting the model. The coefficients are always returned on the original scale. Default is |
partial_method |
Indicate whether to use pairwise partial correlations, or shrinkage partial correlations. |
l2_penalty |
Optional penalty for level-specific regressions (useful in high-dimensional case) |
... |
Additional arguments passed to |
Details
Shrinkage can be imposed by targeting an explicit effective degrees of freedom.
Setting the argument kappa
to a value between 0
and 1
controls
the effective degrees of freedom of the fitted object as a percentage of p
.
When kappa
is 1
the result is equivalent to the result from an ordinary
least squares regression (no shrinkage). Conversely, kappa
set to 0
represents maximum shrinkage.
When p > N
kappa
is a percentage of (N - 2)
.
If no kappa
is set, a linear regression with kappa = 1
is
estimated.
Hierarchical clustering is performed using hclust
. The default is set to
ward.D2 clustering but can be overridden by passing a method argument to ...
.
For high-dimensional problems, the hierarchy becomes very large. Setting q
to a value below 1
reduces the number of levels used in the hierarchy. q
represents a quantile-cutoff of the amount of
variation contributed by the levels. The default (q = NULL
) considers all levels.
When data exhibits multicollinearity it can be useful to include a penalty on the l2 norm in the level-specific regressions.
This can be achieved by setting the l2_penalty
parameter.
Value
An 'hfr' regression object.
Author(s)
Johann Pfitzinger
References
Pfitzinger, Johann (2024). Cluster Regularization via a Hierarchical Feature Regression. _Econometrics and Statistics_ (in press). URL https://doi.org/10.1016/j.ecosta.2024.01.003.
See Also
cv.hfr
, se.avg
, coef
, plot
and predict
methods
Examples
x = matrix(rnorm(100 * 20), 100, 20)
y = rnorm(100)
fit = hfr(x, y, kappa = 0.5)
coef(fit)