mtlgmm {mtlgmm}R Documentation

Fit binary Gaussian mixture models (GMMs) on multiple data sets under a multi-task learning (MTL) setting.

Description

it binary Gaussian mixture models (GMMs) on multiple data sets under a multi-task learning (MTL) setting. This function implements the modified EM algorithm (Altorithm 1) proposed in Tian, Y., Weng, H., & Feng, Y. (2022).

Usage

mtlgmm(
  x,
  step_size = c("lipschitz", "fixed"),
  eta_w = 0.1,
  eta_mu = 0.1,
  eta_beta = 0.1,
  lambda_choice = c("cv", "fixed"),
  cv_nfolds = 5,
  cv_upper = 5,
  cv_lower = 0.01,
  cv_length = 5,
  C1_w = 0.05,
  C1_mu = 0.2,
  C1_beta = 0.2,
  C2_w = 0.05,
  C2_mu = 0.2,
  C2_beta = 0.2,
  kappa = 1/3,
  tol = 1e-05,
  initial_method = c("EM", "kmeans"),
  alignment_method = ifelse(length(x) <= 10, "exhaustive", "greedy"),
  trim = 0.1,
  iter_max = 1000,
  iter_max_prox = 100,
  ncores = 1
)

Arguments

x

design matrices from multiple data sets. Should be a list, of which each component is a matrix or data.frame object, representing the design matrix from each task.

step_size

step size choice in proximal gradient method to solve each optimization problem in the revised EM algorithm (Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)), which can be either "lipschitz" or "fixed". Default = "lipschitz".

  • lipschitz: eta_w, eta_mu and eta_beta will be chosen by the Lipschitz property of the gradient of objective function (without the penalty part). See Section 4.2 of Parikh, N., & Boyd, S. (2014).

  • fixed: eta_w, eta_mu and eta_beta need to be specified

eta_w

step size in the proximal gradient method to learn w (Step 3 of Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when step_size = "fixed".

eta_mu

step size in the proximal gradient method to learn mu (Steps 4 and 5 of Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when step_size = "fixed".

eta_beta

step size in the proximal gradient method to learn beta (Step 9 of Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when step_size = "fixed".

lambda_choice

the choice of constants in the penalty parameter used in the optimization problems. See Algorithm 1 of Tian, Y., Weng, H., & Feng, Y. (2022), which can be either "fixed" or "cv". Default: "cv".

  • cv: cv_nfolds, cv_upper, and cv_length need to be specified. Then the C1 and C2 parameters will be chosen in all combinations in exp(seq(log(cv_lower/10), log(cv_upper/10), length.out = cv_length)) via cross-validation. Note that this is a two-dimensional cv process, because we set C1_w = C2_w, C1_mu = C1_beta = C2_mu = C2_beta to reduce the computational cost.

  • fixed: C1_w, C1_mu, C1_beta, C2_w, C2_mu, and C2_beta need to be specified. See equations (7)-(12) in Tian, Y., Weng, H., & Feng, Y. (2022).

cv_nfolds

the number of cross-validation folds. Default: 5

cv_upper

the upper bound of lambda values used in cross-validation. Default: 5

cv_lower

the lower bound of lambda values used in cross-validation. Default: 0.01

cv_length

the number of lambda values considered in cross-validation. Default: 5

C1_w

the initial value of C1_w. See equations (7) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.05

C1_mu

the initial value of C1_mu. See equations (8) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2

C1_beta

the initial value of C1_beta. See equations (9) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2

C2_w

the initial value of C2_w. See equations (10) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.05

C2_mu

the initial value of C2_mu. See equations (11) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2

C2_beta

the initial value of C2_beta. See equations (12) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2

kappa

the decaying rate used in equation (7)-(12) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 1/3

tol

maximum tolerance in all optimization problems. If the difference between last update and the current update is less than this value, the iterations of optimization will stop. Default: 1e-05

initial_method

initialization method. This indicates the method to initialize the estimates of GMM parameters for each data set. Can be either "EM" or "kmeans". Default: "EM".

  • EM: the initial estimates of GMM parameters will be generated from the single-task EM algorithm. Will call Mclust function in mclust package.

  • kmeans: the initial estimates of GMM parameters will be generated from the single-task k-means algorithm. Will call kmeans function in stats package.

alignment_method

the alignment algorithm to use. See Section 2.4 of Tian, Y., Weng, H., & Feng, Y. (2022). Can either be "exhaustive" or "greedy". Default: when length(x) <= 10, "exhaustive" will be used, otherwise "greedy" will be used.

  • exhaustive: exhaustive search algorithm (Algorithm 2 in Tian, Y., Weng, H., & Feng, Y. (2022)) will be used.

  • greedy: greey label swapping algorithm (Algorithm 3 in Tian, Y., Weng, H., & Feng, Y. (2022)) will be used.

trim

the proportion of trimmed data sets in the cross-validation procedure of choosing tuning parameters. Setting it to a non-zero small value can help avoid the impact of outlier tasks on the choice of tuning parameters. Default: 0.1

iter_max

the maximum iteration number of the revised EM algorithm (i.e. the parameter T in Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 1000

iter_max_prox

the maximum iteration number of the proximal gradient method. Default: 100

ncores

the number of cores to use. Parallel computing is strongly suggested, specially when lambda_choice = "cv". Default: 1

Value

A list with the following components.

w

the estimate of mixture proportion in GMMs for each task. Will be a vector.

mu1

the estimate of Gaussian mean in the first cluster of GMMs for each task. Will be a matrix, where each column represents the estimate for a task.

mu2

the estimate of Gaussian mean in the second cluster of GMMs for each task. Will be a matrix, where each column represents the estimate for a task.

beta

the estimate of the discriminant coefficient for each task. Will be a matrix, where each column represents the estimate for a task.

Sigma

the estimate of the common covariance matrix for each task. Will be a list, where each component represents the estimate for a task.

w_bar

the center estimate of w. Numeric. See Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022).

mu1_bar

the center estimate of mu1. Will be a vector. See Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022).

mu2_bar

the center estimate of mu2. Will be a vector. See Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022).

beta_bar

the center estimate of beta. Will be a vector. See Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022).

C1_w

the initial value of C1_w.

C1_mu

the initial value of C1_mu.

C1_beta

the initial value of C1_beta.

C2_w

the initial value of C2_w.

C2_mu

the initial value of C2_mu.

C2_beta

the initial value of C2_beta.

initial_mu1

the well-aligned initial estimate of mu1 of different tasks. Useful for the alignment problem in transfer learning. See Section 3.4 in Tian, Y., Weng, H., & Feng, Y. (2022).

initial_mu2

the well-aligned initial estimate of mu2 of different tasks. Useful for the alignment problem in transfer learning. See Section 3.4 in Tian, Y., Weng, H., & Feng, Y. (2022).

References

Tian, Y., Weng, H., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.

Parikh, N., & Boyd, S. (2014). Proximal algorithms. Foundations and trends in Optimization, 1(3), 127-239.

See Also

tlgmm, predict_gmm, data_generation, initialize, alignment, alignment_swap, estimation_error, misclustering_error.

Examples

set.seed(0, kind = "L'Ecuyer-CMRG")
library(mclust)
## Consider a 5-task multi-task learning problem in the setting "MTL-1"
data_list <- data_generation(K = 5, outlier_K = 1, simulation_no = "MTL-1",
h_w = 0.1, h_mu = 1, n = 50)  # generate the data
fit <- mtlgmm(x = data_list$data$x, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2,
C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa = 1/3, initial_method = "EM",
trim = 0.1, lambda_choice = "fixed", step_size = "lipschitz")


## compare the performance with that of single-task estimators
# fit single-task GMMs
fitted_values <- initialize(data_list$data$x, "EM")  # initilize the estimates
L <- alignment(fitted_values$mu1, fitted_values$mu2,
method = "exhaustive")  # call the alignment algorithm
fitted_values <- alignment_swap(L$L1, L$L2,
initial_value_list = fitted_values)  # obtain the well-aligned initial estimates

# fit a pooled GMM
x.comb <- Reduce("rbind", data_list$data$x)
fit_pooled <- Mclust(x.comb, G = 2, modelNames = "EEE")
fitted_values_pooled <- list(w = NULL, mu1 = NULL, mu2 = NULL, beta = NULL, Sigma = NULL)
fitted_values_pooled$w <- rep(fit_pooled$parameters$pro[1], length(data_list$data$x))
fitted_values_pooled$mu1 <- matrix(rep(fit_pooled$parameters$mean[,1],
length(data_list$data$x)), ncol = length(data_list$data$x))
fitted_values_pooled$mu2 <- matrix(rep(fit_pooled$parameters$mean[,2],
length(data_list$data$x)), ncol = length(data_list$data$x))
fitted_values_pooled$Sigma <- sapply(1:length(data_list$data$x), function(k){
  fit_pooled$parameters$variance$Sigma
}, simplify = FALSE)
fitted_values_pooled$beta <- sapply(1:length(data_list$data$x), function(k){
  solve(fit_pooled$parameters$variance$Sigma) %*%
  (fit_pooled$parameters$mean[,1] - fit_pooled$parameters$mean[,2])
})
error <- matrix(nrow = 3, ncol = 4, dimnames = list(c("Single-task-GMM","Pooled-GMM","MTL-GMM"),
c("w", "mu", "beta", "Sigma")))
error["Single-task-GMM", "w"] <- estimation_error(
fitted_values$w[-data_list$data$outlier_index],
data_list$parameter$w[-data_list$data$outlier_index], "w")
error["Pooled-GMM", "w"] <- estimation_error(
fitted_values_pooled$w[-data_list$data$outlier_index],
data_list$parameter$w[-data_list$data$outlier_index], "w")
error["MTL-GMM", "w"] <- estimation_error(
fit$w[-data_list$data$outlier_index],
data_list$parameter$w[-data_list$data$outlier_index], "w")

error["Single-task-GMM", "mu"] <- estimation_error(
list(fitted_values$mu1[, -data_list$data$outlier_index],
fitted_values$mu2[, -data_list$data$outlier_index]),
list(data_list$parameter$mu1[, -data_list$data$outlier_index],
data_list$parameter$mu2[, -data_list$data$outlier_index]), "mu")
error["Pooled-GMM", "mu"] <- estimation_error(list(
fitted_values_pooled$mu1[, -data_list$data$outlier_index],
fitted_values_pooled$mu2[, -data_list$data$outlier_index]),
list(data_list$parameter$mu1[, -data_list$data$outlier_index],
data_list$parameter$mu2[, -data_list$data$outlier_index]), "mu")
error["MTL-GMM", "mu"] <- estimation_error(list(
fit$mu1[, -data_list$data$outlier_index],
fit$mu2[, -data_list$data$outlier_index]),
list(data_list$parameter$mu1[, -data_list$data$outlier_index],
data_list$parameter$mu2[, -data_list$data$outlier_index]), "mu")

error["Single-task-GMM", "beta"]  <- estimation_error(
fitted_values$beta[, -data_list$data$outlier_index],
data_list$parameter$beta[, -data_list$data$outlier_index], "beta")
error["Pooled-GMM", "beta"] <- estimation_error(
fitted_values_pooled$beta[, -data_list$data$outlier_index],
data_list$parameter$beta[, -data_list$data$outlier_index], "beta")
error["MTL-GMM", "beta"] <- estimation_error(
fit$beta[, -data_list$data$outlier_index],
data_list$parameter$beta[, -data_list$data$outlier_index], "beta")

error["Single-task-GMM", "Sigma"] <- estimation_error(
fitted_values$Sigma[-data_list$data$outlier_index],
data_list$parameter$Sigma[-data_list$data$outlier_index], "Sigma")
error["Pooled-GMM", "Sigma"] <- estimation_error(
fitted_values_pooled$Sigma[-data_list$data$outlier_index],
data_list$parameter$Sigma[-data_list$data$outlier_index], "Sigma")
error["MTL-GMM", "Sigma"] <- estimation_error(
fit$Sigma[-data_list$data$outlier_index],
data_list$parameter$Sigma[-data_list$data$outlier_index], "Sigma")

error


# use cross-validation to choose the tuning parameters
# warning: can be quite slow, large "ncores" input is suggested!!
fit <- mtlgmm(x = data_list$data$x, kappa = 1/3, initial_method = "EM", ncores = 2, cv_length = 5,
trim = 0.1, cv_upper = 2, cv_lower = 0.01, lambda = "cv", step_size = "lipschitz")



[Package mtlgmm version 0.1.0 Index]