HMTL-package {HMTL} | R Documentation |
Heterogeneous Multi-task Feature Learning
Description
HMTL
package implements the block-wise sparse estimation by grouping the coefficients of related predictors across multiple tasks. The tasks can be either regression, Huber regression, adaptive Huber regression, and logistic regression, which provide a wide variety of data types for the integration. The robust methods, such as the Huber regression and adaptive Huber regression, can deal with outlier contamination based on Sun, Q., Zhou, W.-X. and Fan, J. (2020), and Wang, L., Zheng, C., Zhou, W. and Zhou, W.-X. (2021). The model selection applies the modified form of Bayesian information criterion to measure the mdoel performance, which has similar formulation as previous work developed by Gao, X., and Carroll, R. J., (2017).
Details
In the context of multi-task learning, there are K
different data sets obtained from K
related sources. The data sets can be modeled by different types of learning tasks based on the data distributions. Let the candidate features be denoted as \{M_1,M_2,...,M_j,...,M_p \}
. When the integrated data sets have different measurements, we assume the predictors to share some similarities. For example, the j
th predictors collected as M_j = (X_{1j}, X_{2j}, \cdots, X_{Kj})
in the table below represent the same type of feature in all related studies. In some cases, the tasks can share same set of predictor, then X_{1j} = X_{2j} = \cdots = X_{Kj}
.
Tasks | Formula | M_1 | M_2 | \dots | M_j | \dots | M_p |
1 | y_1: g_1(\mu_1) \sim | x_{11}\theta_{11}+ | x_{12}\theta_{12}+ | \dots | x_{1j}\theta_{1j}+ | \dots | x_{1p}\theta_{1p} |
2 | y_2: g_2(\mu_2) \sim | x_{21}\theta_{21}+ | x_{22}\theta_{22}+ | \dots | x_{2j}\theta_{2j}+ | \dots | x_{2p}\theta_{2p} |
... | |||||||
K | y_K: g_K(\mu_K) \sim | x_{K1}\theta_{K1}+ | x_{K2}\theta_{K2}+ | \dots | x_{Kj}\theta_{Kj}+ | \dots | x_{Kp}\theta_{Kp} |
The coefficients can be grouped as the vector \theta^{(j)}
for the feature M_j
.
Platforms | \bold{M_j} | \bold{\theta^{(j)}} |
|
1 | x_{1j} | \theta_{1j} |
|
2 | x_{2j} | \theta_{2j} |
|
... | ... | ||
k | x_{Kj} | \theta_{Kj}
|
The heterogeneous multi-task feature learning HMTL
can select significant features through the overall objective function:
Q(\theta)= \mathcal{L}(\theta) + \mathcal{R}(\theta).
The loss function is defined as \mathcal{L}(\theta) = \sum_{k=1}^K w_k \ell_k(\theta_k)
, which can be the composite quasi-likelihood or the composite form of (adaptive) Huber loss with additional robustification parameter \tau_k
. The penalty function is the mixed \ell_{2,1}
regularization, such that \mathcal{R}(\theta) = \lambda \sum_{j=1}^p (\sum_{k=1}^K \theta_{kj}^2)^{1/2}
.
This package also contains functions to provide the Bayesian information criterion:
BIC(s) = 2\mathcal{L}_s(\hat{\theta}) + d_s^{*} \gamma_n
with \mathcal{L}_s(\hat{\theta})
denoting the composite quasi-likelihood or adaptive Huber loss, d_s^{*}
measuring the model complexity and \gamma_n
being the penalty on the model complexity.
In this package, the function MTL_reg
deals with regression tasks, which can be outlier contaminated. The function MTL_class
is applied to model multiple classification tasks, and the function MTL_hetero
can integrate different types of tasks together.
Author(s)
Yuan Zhong, Wei Xu, and Xin Gao
Maintainer: Yuan Zhong <aqua.zhong@gmail.com>
References
Zhong, Y., Xu, W., and Gao X., (2023) Heterogeneous multi-task feature learning with mixed \ell_{2,1}
regularization. Submitted
Zhong, Y., Xu, W., and Gao X., (2023) Robust Multi-task Feature Learning. Submitted
Gao, X., and Carroll, R. J., (2017) Data integration with high dimensionality. Biometrika, 104, 2, pp. 251-272
Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Statist., 35, 73–101.
Sun, Q., Zhou, W.-X. and Fan, J. (2020). Adaptive Huber regression. J. Amer. Statist. Assoc., 115, 254-265.
Wang, L., Zheng, C., Zhou, W. and Zhou, W.-X. (2021). A new principle for tuning-free Huber regression. Stat. Sinica, 31, 2153-2177.