RMTL-package {RMTL} | R Documentation |
RMTL: Regularized Multi-Task Learning
Description
This package provides an efficient implementation of regularized multi-task learning (MTL) comprising 10 algorithms applicable for regression, classification, joint feature selection, task clustering, low-rank learning, sparse learning and network incorporation. All algorithms are implemented based on the accelerated gradient descent method and feature a complexity of O(1/k^2). Parallel computing is allowed to improve the efficiency. Sparse model structure is induced by the solving the proximal operator.
Details
This package provides 10 multi-task learning algorithms (5 classification and 5 regression), which incorporate five regularization strategies for knowledge transferring among tasks. All algorithms share the same framework:
\min\limits_{W,C}
\sum_{i}^{t}{L(W_i, C_i|X_i, Y_i)} + \lambda_1\Omega(W) + \lambda_2{||W||}_F^2
where L(\circ)
is the loss function (logistic loss for classification or least square loss for linear regression).
\Omega(\circ)
is the cross-task regularization for knowledge transfer, and ||W||_F^2
is used for improving the
generalization. X=\{X_i= n_i \times p | i \in \{1,...,t\}\}
and Y=\{Y_i=n_i \times 1 | i \in \{1,...,t\}\}
are
predictors matrices and responses of t
tasks respectively, while each task i
contains n_i
subjects and p
predictors. W=p \times t
is the coefficient matrix, where W_i
, the i
th column of W
,
refers to the coefficient vector of task i
.
The function \Omega(W)
jointly modulates multi-task models(\{W_1, W_2, ..., W_t\}
) according to specific
prior structure of W
. In this package, 5 common regularization methods are implemented to incorporate different priors, i.e.
sparse structure (\Omega(W)=||W||_1
), joint feature selection (\Omega(W)=||W||_{2,1}
), low-rank structure
(\Omega(W)=||W||_*
), network-based relatedness across tasks (\Omega(W)=||WG||_F^2
) and task clustering
(\Omega(W)=tr(W^TW)-tr(F^TW^TWF)
). To call a specific method correctly, the corresponding "short name" has to be given.
Follow the above sequence of methods, the short names are defined: L21
, Lasso
, Trace
, Graph
and CMTL
For all algorithms, we implemented an solver based on the accelerated
gradient descent method, which takes advantage of information from the
previous two iterations to calculate the current gradient and then
achieves an improved convergent rate. To solve the non-smooth and convex
regularizer, the proximal operator is applied. Moreover, backward
line search is used to determine the appropriate step-size in each
iteration. Overall, the solver achieves a complexity of
O(\frac{1}{k^2})
and is optimal among first-order gradient
descent methods.
For the academic references of the implemented algorithms, the users are referred to the paper (doi:10.1093/bioinformatics/bty831) or the vignettes in the package.