rarefit {rare} | R Documentation |
Fit the rare feature selection model
Description
Fit the rare feature selection model proposed in Yan and Bien (2018):
min_{\beta, \gamma} 0.5 * ||y - X\beta - \beta_01_n||_2^2 +
\lambda * (\alpha * ||\gamma_{-root}||_1 + (1-\alpha) * ||\beta||_1)
using an alternating direction method of multipliers (ADMM) algorithm
described in Algorithm 1 of the same paper.
The regularization path is computed over a two-dimensional grid of
regularization parameters: lambda
and alpha
. Of the two,
lambda
controls the overall amount of regularization, and alpha
controls the tradeoff between sparsity and fusion of \beta
(larger alpha
induces more fusion in \beta
).
Usage
rarefit(y, X, A = NULL, Q = NULL, hc, intercept = T, lambda = NULL,
alpha = NULL, nlam = 50, lam.min.ratio = 1e-04, nalpha = 10,
rho = 0.01, eps1 = 1e-06, eps2 = 1e-05, maxite = 1e+06)
Arguments
y |
Length- |
X |
|
A |
|
Q |
|
hc |
An |
intercept |
Whether intercept be fitted (default = TRUE) or set to zero (FALSE). |
lambda |
A user-supplied |
alpha |
A user-supplied |
nlam |
Number of |
lam.min.ratio |
Smallest value for |
nalpha |
Number of |
rho |
Penalty parameter for the quadratic penalty in the ADMM algorithm.
The default value is |
eps1 |
Convergence threshold in terms of the absolute tolerance level
for the ADMMM algorithm. The default value is |
eps2 |
Convergence threshold in terms of the relative tolerance level
for the ADMM algorithm. The default value is |
maxite |
Maximum number of passes over the data for every pair of
( |
Details
The function splits model fitting path by alpha
. At each alpha
value,
the model is fit on the entire sequence of lambda
with warm start. We recommend
including an intercept (by setting intercept=T
) unless the input data have been
centered.
Value
Returns regression coefficients for beta
and gamma
and
intercept beta0
. We use a matrix-nested-within-list structure to store the coefficients: each list
item corresponds to an alpha
value; matrix (or vector) in that list item stores
coefficients at various lambda
values by columns (or entries).
beta0 |
Length- |
beta |
Length- |
gamma |
Length- |
lambda |
Sequence of |
alpha |
Sequence of |
A |
Binary matrix encoding ancestor-descendant relationship between leaves and nodes in the tree. |
Q |
Matrix with columns forming an orthonormal basis for the null space of |
intercept |
Whether an intercept is included in model fit. |
References
Yan, X. and Bien, J. (2018) Rare Feature Selection in High Dimensions, https://arxiv.org/abs/1803.06675.
See Also
Examples
## Not run:
# See vignette for more details.
set.seed(100)
ts <- sample(1:length(data.rating), 400) # Train set indices
# Fit the model on train set
ourfit <- rarefit(y = data.rating[ts], X = data.dtm[ts, ], hc = data.hc, lam.min.ratio = 1e-6,
nlam = 20, nalpha = 10, rho = 0.01, eps1 = 1e-5, eps2 = 1e-5, maxite = 1e4)
## End(Not run)