data_generation {mtlgmm}R Documentation

Generate data for simulations.

Description

Generate data for simulations. All models used in Tian, Y., Weng, H., & Feng, Y. (2022)) are implemented.

Usage

data_generation(
  K = 10,
  outlier_K = 1,
  simulation_no = c("MTL-1", "MTL-2"),
  h_w = 0.1,
  h_mu = 1,
  n = 50
)

Arguments

K

the number of tasks (data sets). Default: 10

outlier_K

the number of outlier tasks. Default: 1

simulation_no

simulation number in Tian, Y., Weng, H., & Feng, Y. (2022)). Can be "MTL-1", "MTL-2". Default = "MTL-1".

h_w

the value of h_w. Default: 0.1

h_mu

the value of h_mu. Default: 1

n

the sample size of each task. Can be either an positive integer or a vector of length K. If it is an integer, then the sample size of all tasks will be the same and equal to n. If it is a vector, then the k-th number will be the sample size of the k-th task. Default: 50.

Value

a list of two sub-lists "data" and "parameter". List "data" contains a list of design matrices x, a list of hidden labels y, and a vector of outlier task indices outlier_index. List "parameter" contains a vector w of mixture proportions, a matrix mu1 of which each column is the GMM mean of the first cluster of each task, a matrix mu2 of which each column is the GMM mean of the second cluster of each task, a matrix beta of which each column is the discriminant coefficient in each task, a list Sigma of covariance matrices for each task.

References

Tian, Y., Weng, H., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.

See Also

mtlgmm, tlgmm, predict_gmm, initialize, alignment, alignment_swap, estimation_error, misclustering_error.

Examples

data_list <- data_generation(K = 5, outlier_K = 1, simulation_no = "MTL-1", h_w = 0.1,
h_mu = 1, n = 50)

[Package mtlgmm version 0.1.0 Index]