TFRE {TFRE} | R Documentation |
Fit a TFRE regression model with Lasso, SCAD or MCP regularization
Description
Fit a TFRE Lasso model and/or a TFRE SCAD or MCP model. The TFRE regression models are fitted via QICD algorithm and Incomplete U-statistics resampling technique (optional). The tuning parameter of TFRE Lasso regression is estimated by the covariate matrix X. The TFRE SCAD / MCP regressions are computed at a grid of values for the tuning parameter eta. High dimensional BIC (HBIC) will be used as the criterion on the TFRE SCAD / MCP tuning parameter searching.
Usage
TFRE(
X,
y,
alpha0 = 0.1,
const_lambda = 1.01,
times = 500,
incomplete = TRUE,
const_incomplete = 10,
thresh = 1e-06,
maxin = 100,
maxout = 20,
second_stage = "scad",
a = 3.7,
eta_list = NULL,
const_hbic = 6
)
Arguments
X |
Input matrix, of dimension n_obs x n_vars; each row is an observation vector. |
y |
Response variable. |
alpha0 |
The level to estimate the tuning parameter. Default value is 0.1.
See more details in the "Details" section of |
const_lambda |
The constant to estimate the tuning parameter, should be
greater than 1. Default value is 1.01. See more details in the "Details" section
of |
times |
The size of simulated samples to estimate the tuning parameter. Default value is 500. |
incomplete |
Logical. If |
const_incomplete |
The constant for the Incomplete U-statistics
resampling technique. If |
thresh |
Convergence threshold for QICD algorithm. Default value is 1e-6. See more details in Peng and Wang (2015). |
maxin |
Maximum number of inner coordiante descent iterations in QICD algorithm; default is 100. See more details in Peng and Wang (2015). |
maxout |
Maximum number of outter Majoriaztion Minimization step (MM) iterations in QICD algorithm; default is 20. See more details in Peng and Wang (2015). |
second_stage |
Penalty function for the second stage model. Character vector,
which can be "scad", "mcp" and "none". If |
a |
an unknown parameter in SCAD and MCP penalty functions. The default value is 3.7, suggested by Fan and Li (2001). |
eta_list |
A numerical vector for the tuning parameters to be used in the
TFRE SCAD or MCP regression. Cannot be |
const_hbic |
The constant to be used in calculating HBIC in the TFRE SCAD regression. Default value is 6. See more details in "Details". |
Details
Wang et al. (2020) proposed the TFRE Lasso estimator for high-dimensional linear regressions with heavy-tailed errors as below:
\widehat{\bm{\beta}}(\lambda^*) = \arg\min_{\bm{\beta}}\frac{1}{n(n-1)}{\sum\sum}_{i\neq j}\left|(Y_i-\bm{x}_i^T\bm{\beta})-(Y_j-\bm{x}_j^T\bm{\beta})\right| + \lambda^*\sum_{k=1}^p|\beta_k|,
where \lambda^*
is the tuning parameter estimated by est_lambda
.
The TFRE Lasso model is fitted by QICD algorithm proposed in Peng and Wang (2015).
To overcome the computational barrier arising from the U-statistics structure of
the aforementioned loss function, we apply the Incomplete U-statistics
resampling technique which was first proposed in Clémençon, Colin and Bellet (2016).
Wang et al. (2020) also proposed a second-stage enhancement by using the
TFRE Lasso estimator \widehat{\bm{\beta}}(\lambda^*)
as an initial estimator.
It is defined as:
\widetilde{\bm{\beta}}^{(1)} = \arg\min_{\bm{\beta}}\frac{1}{n(n-1)}{\sum\sum}_{i\neq j}\left|(Y_i-\bm{x}_i^T\bm{\beta})-(Y_j-\bm{x}_j^T\bm{\beta})\right| + \sum_{k=1}^pp_{\eta}'(|\widehat{\beta}_k(\lambda^*)|)|\beta_k|,
where p'_{\eta}(\cdot)
denotes the derivative of some nonconvex penalty
function p_{\eta}(\cdot)
, \eta > 0
is a tuning parameter. This
function implements the second-stage enhancement with two popular nonconvex
penalty functions: SCAD and MCP. The modified high-dimensional BIC criterion
in Wang et al. (2020) is employed for selecting \eta
. Define:
HBIC(\eta) = \log\left\{{\sum\sum}_{i\neq j}\left|(Y_i-\bm{x}_i^T\widetilde{\bm{\beta}}_{\eta})-(Y_j-\bm{x}_j^T\widetilde{\bm{\beta}}_{\eta})\right|\right\} + |A_\eta|\frac{\log\log n}{n* const\_hbic}\log p,
where \widetilde{\bm{\beta}}_{\eta}
denotes the second-stage estimator with
the tuning parameter value \eta
, and |A_\eta|
denotes the cardinality
of the index set of the selected model. This function selects the value of \eta
that minimizes HBIC(\eta
).
Value
An object of class "TFRE", which is a list containing at least the following components:
X |
The input matrix used. |
y |
The response variable used. |
incomplete |
Logical. |
beta_TFRE_Lasso |
The estimated coefficient vector of the TFRE Lasso regression. The first element is the estimated intercept. |
tfre_lambda |
The estimated tuning parameter of the TFRE Lasso regression. |
second_stage |
Character vector, |
If second_stage = "scad"
, then the fitted TFRE object will also contain
an object named as "TFRE_scad", which is a list containing the following components:
Beta_TFRE_scad |
The estimated coefficient matrix of the TFRE SCAD regression.
The diminsion is n_eta x (p+1) with the first column to be the intercepts,
where n_eta is the length of |
df_TFRE_scad |
The number of nonzero coefficients (intercept excluded) for
each value in |
eta_list |
The tuning parameter vector used in the TFRE SCAD regressions |
hbic |
A numerical vector of HBIC values for the TFRE SCAD model corresponding
to each value in |
eta_min |
The eta value which yields the smallest HBIC value in the TFRE SCAD regression. |
Beta_TFRE_scad_min |
The estimated coefficient vector which employs |
If second_stage = "mcp"
, then the fitted TFRE object will also contain
an object named as "TFRE_mcp", which is a list containing the following components:
Beta_TFRE_mcp |
The estimated coefficient matrix of the TFRE MCP regression.
The diminsion is n_eta x (p+1) with the first column to be the intercepts,
where n_eta is the length of |
df_TFRE_mcp |
The number of nonzero coefficients (intercept excluded) for
each value in |
eta_list |
The tuning parameter vector used in the TFRE MCP regressions |
hbic |
A numerical vector of HBIC values for the TFRE MCP model corresponding
to each value in |
eta_min |
The eta value which yields the smallest HBIC value in the TFRE MCP regression. |
Beta_TFRE_mcp_min |
The estimated coefficient vector which employs |
Author(s)
Yunan Wu and Lan Wang
Maintainer:
Yunan Wu <yunan.wu@utdallas.edu>
References
Wang, L., Peng, B., Bradic, J., Li, R. and Wu, Y. (2020),
A Tuning-free Robust and Efficient Approach to High-dimensional Regression,
Journal of the American Statistical Association, 115:532, 1700-1714,
doi:10.1080/01621459.2020.1840989.
Peng, B. and Wang, L. (2015),
An Iterative Coordinate Descent Algorithm for High-Dimensional Nonconvex
Penalized Quantile Regression, Journal of Computational and Graphical Statistics,
24:3, 676-694, doi:10.1080/10618600.2014.913516.
Clémençon, S., Colin, I., and Bellet, A. (2016),
Scaling-up empirical risk minimization: optimization of incomplete u-statistics.
The Journal of Machine Learning Research, 17(1):2682–2717.
Fan, J. and Li, R. (2001),
Variable Selection via Nonconcave Penalized Likelihood and its Oracle
Properties, Journal of the American Statistical Association, 96:456, 1348-1360,
doi:10.1198/016214501753382273.
See Also
predict.TFRE
, coef.TFRE
, plot.TFRE
, est_lambda
Examples
n <- 20; p <- 50
beta0 <- c(1.5,-1.25,1,-0.75,0.5,rep(0,p-5))
eta_list <- 0.1*6:15*sqrt(log(p)/n)
X <- matrix(rnorm(n*p),n)
y <- X %*% beta0 + rt(n,4)
Obj_TFRE_Lasso <- TFRE(X, y, second_stage = "none", const_incomplete = 5)
Obj_TFRE_Lasso$beta_TFRE_Lasso[1:10]
Obj_TFRE_SCAD <- TFRE(X, y, eta_list = eta_list, const_incomplete = 5)
Obj_TFRE_SCAD$TFRE_scad$hbic
Obj_TFRE_SCAD$TFRE_scad$df_TFRE_scad
Obj_TFRE_SCAD$TFRE_scad$Beta_TFRE_scad_min[1:10]
Obj_TFRE_MCP <- TFRE(X, y, second_stage = "mcp", eta_list = eta_list, const_incomplete = 5)
Obj_TFRE_MCP$TFRE_mcp$hbic
Obj_TFRE_MCP$TFRE_mcp$df_TFRE_mcp
Obj_TFRE_MCP$TFRE_mcp$Beta_TFRE_mcp_min[1:10]