R: Fit binary Gaussian mixture models (GMMs) on multiple data...

mtlgmm {mtlgmm}

R Documentation

Fit binary Gaussian mixture models (GMMs) on multiple data sets under a multi-task learning (MTL) setting.

Description

it binary Gaussian mixture models (GMMs) on multiple data sets under a multi-task learning (MTL) setting. This function implements the modified EM algorithm (Altorithm 1) proposed in Tian, Y., Weng, H., & Feng, Y. (2022).

Usage

mtlgmm(
  x,
  step_size = c("lipschitz", "fixed"),
  eta_w = 0.1,
  eta_mu = 0.1,
  eta_beta = 0.1,
  lambda_choice = c("cv", "fixed"),
  cv_nfolds = 5,
  cv_upper = 5,
  cv_lower = 0.01,
  cv_length = 5,
  C1_w = 0.05,
  C1_mu = 0.2,
  C1_beta = 0.2,
  C2_w = 0.05,
  C2_mu = 0.2,
  C2_beta = 0.2,
  kappa = 1/3,
  tol = 1e-05,
  initial_method = c("EM", "kmeans"),
  alignment_method = ifelse(length(x) <= 10, "exhaustive", "greedy"),
  trim = 0.1,
  iter_max = 1000,
  iter_max_prox = 100,
  ncores = 1
)

Arguments

`x`	design matrices from multiple data sets. Should be a list, of which each component is a `matrix` or `data.frame` object, representing the design matrix from each task.
`step_size`	step size choice in proximal gradient method to solve each optimization problem in the revised EM algorithm (Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)), which can be either "lipschitz" or "fixed". Default = "lipschitz". lipschitz: `eta_w`, `eta_mu` and `eta_beta` will be chosen by the Lipschitz property of the gradient of objective function (without the penalty part). See Section 4.2 of Parikh, N., & Boyd, S. (2014). fixed: `eta_w`, `eta_mu` and `eta_beta` need to be specified
`eta_w`	step size in the proximal gradient method to learn w (Step 3 of Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when `step_size` = "fixed".
`eta_mu`	step size in the proximal gradient method to learn mu (Steps 4 and 5 of Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when `step_size` = "fixed".
`eta_beta`	step size in the proximal gradient method to learn beta (Step 9 of Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when `step_size` = "fixed".
`lambda_choice`	the choice of constants in the penalty parameter used in the optimization problems. See Algorithm 1 of Tian, Y., Weng, H., & Feng, Y. (2022), which can be either "fixed" or "cv". Default: "cv". cv: `cv_nfolds`, `cv_upper`, and `cv_length` need to be specified. Then the C1 and C2 parameters will be chosen in all combinations in `exp(seq(log(cv_lower/10), log(cv_upper/10), length.out = cv_length))` via cross-validation. Note that this is a two-dimensional cv process, because we set `C1_w` = `C2_w`, `C1_mu` = `C1_beta` = `C2_mu` = `C2_beta` to reduce the computational cost. fixed: `C1_w`, `C1_mu`, `C1_beta`, `C2_w`, `C2_mu`, and `C2_beta` need to be specified. See equations (7)-(12) in Tian, Y., Weng, H., & Feng, Y. (2022).
`cv_nfolds`	the number of cross-validation folds. Default: 5
`cv_upper`	the upper bound of `lambda` values used in cross-validation. Default: 5
`cv_lower`	the lower bound of `lambda` values used in cross-validation. Default: 0.01
`cv_length`	the number of `lambda` values considered in cross-validation. Default: 5
`C1_w`	the initial value of C1_w. See equations (7) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.05
`C1_mu`	the initial value of C1_mu. See equations (8) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2
`C1_beta`	the initial value of C1_beta. See equations (9) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2
`C2_w`	the initial value of C2_w. See equations (10) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.05
`C2_mu`	the initial value of C2_mu. See equations (11) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2
`C2_beta`	the initial value of C2_beta. See equations (12) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2
`kappa`	the decaying rate used in equation (7)-(12) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 1/3
`tol`	maximum tolerance in all optimization problems. If the difference between last update and the current update is less than this value, the iterations of optimization will stop. Default: 1e-05
`initial_method`	initialization method. This indicates the method to initialize the estimates of GMM parameters for each data set. Can be either "EM" or "kmeans". Default: "EM". EM: the initial estimates of GMM parameters will be generated from the single-task EM algorithm. Will call `Mclust` function in `mclust` package. kmeans: the initial estimates of GMM parameters will be generated from the single-task k-means algorithm. Will call `kmeans` function in `stats` package.
`alignment_method`	the alignment algorithm to use. See Section 2.4 of Tian, Y., Weng, H., & Feng, Y. (2022). Can either be "exhaustive" or "greedy". Default: when `length(x)` <= 10, "exhaustive" will be used, otherwise "greedy" will be used. exhaustive: exhaustive search algorithm (Algorithm 2 in Tian, Y., Weng, H., & Feng, Y. (2022)) will be used. greedy: greey label swapping algorithm (Algorithm 3 in Tian, Y., Weng, H., & Feng, Y. (2022)) will be used.
`trim`	the proportion of trimmed data sets in the cross-validation procedure of choosing tuning parameters. Setting it to a non-zero small value can help avoid the impact of outlier tasks on the choice of tuning parameters. Default: 0.1
`iter_max`	the maximum iteration number of the revised EM algorithm (i.e. the parameter T in Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 1000
`iter_max_prox`	the maximum iteration number of the proximal gradient method. Default: 100
`ncores`	the number of cores to use. Parallel computing is strongly suggested, specially when `lambda_choice` = "cv". Default: 1

Value

A list with the following components.

`w`	the estimate of mixture proportion in GMMs for each task. Will be a vector.
`mu1`	the estimate of Gaussian mean in the first cluster of GMMs for each task. Will be a matrix, where each column represents the estimate for a task.
`mu2`	the estimate of Gaussian mean in the second cluster of GMMs for each task. Will be a matrix, where each column represents the estimate for a task.
`beta`	the estimate of the discriminant coefficient for each task. Will be a matrix, where each column represents the estimate for a task.
`Sigma`	the estimate of the common covariance matrix for each task. Will be a list, where each component represents the estimate for a task.
`w_bar`	the center estimate of w. Numeric. See Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022).
`mu1_bar`	the center estimate of mu1. Will be a vector. See Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022).
`mu2_bar`	the center estimate of mu2. Will be a vector. See Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022).
`beta_bar`	the center estimate of beta. Will be a vector. See Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022).
`C1_w`	the initial value of C1_w.
`C1_mu`	the initial value of C1_mu.
`C1_beta`	the initial value of C1_beta.
`C2_w`	the initial value of C2_w.
`C2_mu`	the initial value of C2_mu.
`C2_beta`	the initial value of C2_beta.
`initial_mu1`	the well-aligned initial estimate of mu1 of different tasks. Useful for the alignment problem in transfer learning. See Section 3.4 in Tian, Y., Weng, H., & Feng, Y. (2022).
`initial_mu2`	the well-aligned initial estimate of mu2 of different tasks. Useful for the alignment problem in transfer learning. See Section 3.4 in Tian, Y., Weng, H., & Feng, Y. (2022).

References

Tian, Y., Weng, H., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.

Parikh, N., & Boyd, S. (2014). Proximal algorithms. Foundations and trends in Optimization, 1(3), 127-239.

Examples

set.seed(0, kind = "L'Ecuyer-CMRG")
library(mclust)
## Consider a 5-task multi-task learning problem in the setting "MTL-1"
data_list <- data_generation(K = 5, outlier_K = 1, simulation_no = "MTL-1",
h_w = 0.1, h_mu = 1, n = 50)  # generate the data
fit <- mtlgmm(x = data_list$data$x, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2,
C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa = 1/3, initial_method = "EM",
trim = 0.1, lambda_choice = "fixed", step_size = "lipschitz")


## compare the performance with that of single-task estimators
# fit single-task GMMs
fitted_values <- initialize(data_list$data$x, "EM")  # initilize the estimates
L <- alignment(fitted_values$mu1, fitted_values$mu2,
method = "exhaustive")  # call the alignment algorithm
fitted_values <- alignment_swap(L$L1, L$L2,
initial_value_list = fitted_values)  # obtain the well-aligned initial estimates

# fit a pooled GMM
x.comb <- Reduce("rbind", data_list$data$x)
fit_pooled <- Mclust(x.comb, G = 2, modelNames = "EEE")
fitted_values_pooled <- list(w = NULL, mu1 = NULL, mu2 = NULL, beta = NULL, Sigma = NULL)
fitted_values_pooled$w <- rep(fit_pooled$parameters$pro[1], length(data_list$data$x))
fitted_values_pooled$mu1 <- matrix(rep(fit_pooled$parameters$mean[,1],
length(data_list$data$x)), ncol = length(data_list$data$x))
fitted_values_pooled$mu2 <- matrix(rep(fit_pooled$parameters$mean[,2],
length(data_list$data$x)), ncol = length(data_list$data$x))
fitted_values_pooled$Sigma <- sapply(1:length(data_list$data$x), function(k){
  fit_pooled$parameters$variance$Sigma
}, simplify = FALSE)
fitted_values_pooled$beta <- sapply(1:length(data_list$data$x), function(k){
  solve(fit_pooled$parameters$variance$Sigma) %*%
  (fit_pooled$parameters$mean[,1] - fit_pooled$parameters$mean[,2])
})
error <- matrix(nrow = 3, ncol = 4, dimnames = list(c("Single-task-GMM","Pooled-GMM","MTL-GMM"),
c("w", "mu", "beta", "Sigma")))
error["Single-task-GMM", "w"] <- estimation_error(
fitted_values$w[-data_list$data$outlier_index],
data_list$parameter$w[-data_list$data$outlier_index], "w")
error["Pooled-GMM", "w"] <- estimation_error(
fitted_values_pooled$w[-data_list$data$outlier_index],
data_list$parameter$w[-data_list$data$outlier_index], "w")
error["MTL-GMM", "w"] <- estimation_error(
fit$w[-data_list$data$outlier_index],
data_list$parameter$w[-data_list$data$outlier_index], "w")

error["Single-task-GMM", "mu"] <- estimation_error(
list(fitted_values$mu1[, -data_list$data$outlier_index],
fitted_values$mu2[, -data_list$data$outlier_index]),
list(data_list$parameter$mu1[, -data_list$data$outlier_index],
data_list$parameter$mu2[, -data_list$data$outlier_index]), "mu")
error["Pooled-GMM", "mu"] <- estimation_error(list(
fitted_values_pooled$mu1[, -data_list$data$outlier_index],
fitted_values_pooled$mu2[, -data_list$data$outlier_index]),
list(data_list$parameter$mu1[, -data_list$data$outlier_index],
data_list$parameter$mu2[, -data_list$data$outlier_index]), "mu")
error["MTL-GMM", "mu"] <- estimation_error(list(
fit$mu1[, -data_list$data$outlier_index],
fit$mu2[, -data_list$data$outlier_index]),
list(data_list$parameter$mu1[, -data_list$data$outlier_index],
data_list$parameter$mu2[, -data_list$data$outlier_index]), "mu")

error["Single-task-GMM", "beta"]  <- estimation_error(
fitted_values$beta[, -data_list$data$outlier_index],
data_list$parameter$beta[, -data_list$data$outlier_index], "beta")
error["Pooled-GMM", "beta"] <- estimation_error(
fitted_values_pooled$beta[, -data_list$data$outlier_index],
data_list$parameter$beta[, -data_list$data$outlier_index], "beta")
error["MTL-GMM", "beta"] <- estimation_error(
fit$beta[, -data_list$data$outlier_index],
data_list$parameter$beta[, -data_list$data$outlier_index], "beta")

error["Single-task-GMM", "Sigma"] <- estimation_error(
fitted_values$Sigma[-data_list$data$outlier_index],
data_list$parameter$Sigma[-data_list$data$outlier_index], "Sigma")
error["Pooled-GMM", "Sigma"] <- estimation_error(
fitted_values_pooled$Sigma[-data_list$data$outlier_index],
data_list$parameter$Sigma[-data_list$data$outlier_index], "Sigma")
error["MTL-GMM", "Sigma"] <- estimation_error(
fit$Sigma[-data_list$data$outlier_index],
data_list$parameter$Sigma[-data_list$data$outlier_index], "Sigma")

error


# use cross-validation to choose the tuning parameters
# warning: can be quite slow, large "ncores" input is suggested!!
fit <- mtlgmm(x = data_list$data$x, kappa = 1/3, initial_method = "EM", ncores = 2, cv_length = 5,
trim = 0.1, cv_upper = 2, cv_lower = 0.01, lambda = "cv", step_size = "lipschitz")

[Package mtlgmm version 0.1.0 Index]