R: Fit the binary Gaussian mixture model (GMM) on target data...

tlgmm {mtlgmm}

R Documentation

Fit the binary Gaussian mixture model (GMM) on target data set by leveraging multiple source data sets under a transfer learning (TL) setting.

Description

Fit the binary Gaussian mixture model (GMM) on target data set by leveraging multiple source data sets under a transfer learning (TL) setting. This function implements the modified EM algorithm (Altorithm 4) proposed in Tian, Y., Weng, H., & Feng, Y. (2022).

Usage

tlgmm(
  x,
  fitted_bar,
  step_size = c("lipschitz", "fixed"),
  eta_w = 0.1,
  eta_mu = 0.1,
  eta_beta = 0.1,
  lambda_choice = c("fixed", "cv"),
  cv_nfolds = 5,
  cv_upper = 2,
  cv_lower = 0.01,
  cv_length = 5,
  C1_w = 0.05,
  C1_mu = 0.2,
  C1_beta = 0.2,
  C2_w = 0.05,
  C2_mu = 0.2,
  C2_beta = 0.2,
  kappa0 = 1/3,
  tol = 1e-05,
  initial_method = c("kmeans", "EM"),
  iter_max = 1000,
  iter_max_prox = 100,
  ncores = 1
)

Arguments

`x`	design matrix of the target data set. Should be a `matrix` or `data.frame` object.
`fitted_bar`	the output from `mtlgmm` function.
`step_size`	step size choice in proximal gradient method to solve each optimization problem in the revised EM algorithm (Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)), which can be either "lipschitz" or "fixed". Default = "lipschitz". lipschitz: `eta_w`, `eta_mu` and `eta_beta` will be chosen by the Lipschitz property of the gradient of objective function (without the penalty part). See Section 4.2 of Parikh, N., & Boyd, S. (2014). fixed: `eta_w`, `eta_mu` and `eta_beta` need to be specified
`eta_w`	step size in the proximal gradient method to learn w (Step 3 of Algorithm 4 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when `step_size` = "fixed".
`eta_mu`	step size in the proximal gradient method to learn mu (Steps 4 and 5 of Algorithm 4 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when `step_size` = "fixed".
`eta_beta`	step size in the proximal gradient method to learn beta (Step 7 of Algorithm 4 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when `step_size` = "fixed".
`lambda_choice`	the choice of constants in the penalty parameter used in the optimization problems. See Algorithm 4 of Tian, Y., Weng, H., & Feng, Y. (2022), which can be either "fixed" or "cv". Default = "cv". cv: `cv_nfolds`, `cv_upper`, and `cv_length` need to be specified. Then the C1 and C2 parameters will be chosen in all combinations in `exp(seq(log(cv_lower/10), log(cv_upper/10), length.out = cv_length))` via cross-validation. Note that this is a two-dimensional cv process, because we set `C1_w` = `C2_w`, `C1_mu` = `C1_beta` = `C2_mu` = `C2_beta` to reduce the computational cost. fixed: `C1_w`, `C1_mu`, `C1_beta`, `C2_w`, `C2_mu`, and `C2_beta` need to be specified. See equations (19)-(24) in Tian, Y., Weng, H., & Feng, Y. (2022).
`cv_nfolds`	the number of cross-validation folds. Default: 5
`cv_upper`	the upper bound of `lambda` values used in cross-validation. Default: 5
`cv_lower`	the lower bound of `lambda` values used in cross-validation. Default: 0.01
`cv_length`	the number of `lambda` values considered in cross-validation. Default: 5
`C1_w`	the initial value of C1_w. See equations (19) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.05
`C1_mu`	the initial value of C1_mu. See equations (20) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2
`C1_beta`	the initial value of C1_beta. See equations (21) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2
`C2_w`	the initial value of C2_w. See equations (22) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.05
`C2_mu`	the initial value of C2_mu. See equations (23) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2
`C2_beta`	the initial value of C2_beta. See equations (24) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2
`kappa0`	the decaying rate used in equation (19)-(24) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 1/3
`tol`	maximum tolerance in all optimization problems. If the difference between last update and the current update is less than this value, the iterations of optimization will stop. Default: 1e-05
`initial_method`	initialization method. This indicates the method to initialize the estimates of GMM parameters for each data set. Can be either "kmeans" or "EM". kmeans: the initial estimates of GMM parameters will be generated from the single-task k-means algorithm. Will call `kmeans` function in `stats` package. EM: the initial estimates of GMM parameters will be generated from the single-task EM algorithm. Will call `Mclust` function in `mclust` package.
`iter_max`	the maximum iteration number of the revised EM algorithm (i.e. the parameter T in Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 1000
`iter_max_prox`	the maximum iteration number of the proximal gradient method. Default: 100
`ncores`	the number of cores to use. Parallel computing is strongly suggested, specially when `lambda_choice` = "cv". Default: 1

Value

A list with the following components.

`w`	the estimate of mixture proportion in GMMs for the target task. Will be a vector.
`mu1`	the estimate of Gaussian mean in the first cluster of GMMs for the target task. Will be a matrix, where each column represents the estimate for a task.
`mu2`	the estimate of Gaussian mean in the second cluster of GMMs for the target task. Will be a matrix, where each column represents the estimate for a task.
`beta`	the estimate of the discriminant coefficient for the target task. Will be a matrix, where each column represents the estimate for a task.
`Sigma`	the estimate of the common covariance matrix for the target task. Will be a list, where each component represents the estimate for a task.
`C1_w`	the initial value of C1_w.
`C1_mu`	the initial value of C1_mu.
`C1_beta`	the initial value of C1_beta.
`C2_w`	the initial value of C2_w.
`C2_mu`	the initial value of C2_mu.
`C2_beta`	the initial value of C2_beta.

References

Tian, Y., Weng, H., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.

Parikh, N., & Boyd, S. (2014). Proximal algorithms. Foundations and trends in Optimization, 1(3), 127-239.

Examples

set.seed(0, kind = "L'Ecuyer-CMRG")
## Consider a transfer learning problem with 3 source tasks and 1 target task in the setting "MTL-1"
data_list_source <- data_generation(K = 3, outlier_K = 0, simulation_no = "MTL-1", h_w = 0,
h_mu = 0, n = 50)  # generate the source data
data_target <- data_generation(K = 1, outlier_K = 0, simulation_no = "MTL-1", h_w = 0.1,
h_mu = 1, n = 50)  # generate the target data
fit_mtl <- mtlgmm(x = data_list_source$data$x, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2,
C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa = 1/3, initial_method = "EM",
trim = 0.1, lambda_choice = "fixed", step_size = "lipschitz")

fit_tl <- tlgmm(x = data_target$data$x[[1]], fitted_bar = fit_mtl, C1_w = 0.05,
C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa0 = 1/3,
initial_method = "EM", ncores = 1, lambda_choice = "fixed", step_size = "lipschitz")


# use cross-validation to choose the tuning parameters
# warning: can be quite slow, large "ncores" input is suggested!!
fit_tl <- tlgmm(x = data_target$data$x[[1]], fitted_bar = fit_mtl, kappa0 = 1/3,
initial_method = "EM", ncores = 2, lambda_choice = "cv", step_size = "lipschitz")

[Package mtlgmm version 0.1.0 Index]