R: Generate data for simulations.

data_generation {mtlgmm}

R Documentation

Generate data for simulations.

Description

Generate data for simulations. All models used in Tian, Y., Weng, H., & Feng, Y. (2022)) are implemented.

Usage

data_generation(
  K = 10,
  outlier_K = 1,
  simulation_no = c("MTL-1", "MTL-2"),
  h_w = 0.1,
  h_mu = 1,
  n = 50
)

Arguments

`K`	the number of tasks (data sets). Default: 10
`outlier_K`	the number of outlier tasks. Default: 1
`simulation_no`	simulation number in Tian, Y., Weng, H., & Feng, Y. (2022)). Can be "MTL-1", "MTL-2". Default = "MTL-1".
`h_w`	the value of h_w. Default: 0.1
`h_mu`	the value of h_mu. Default: 1
`n`	the sample size of each task. Can be either an positive integer or a vector of length `K`. If it is an integer, then the sample size of all tasks will be the same and equal to `n`. If it is a vector, then the k-th number will be the sample size of the k-th task. Default: 50.

Value

a list of two sub-lists "data" and "parameter". List "data" contains a list of design matrices x, a list of hidden labels y, and a vector of outlier task indices outlier_index. List "parameter" contains a vector w of mixture proportions, a matrix mu1 of which each column is the GMM mean of the first cluster of each task, a matrix mu2 of which each column is the GMM mean of the second cluster of each task, a matrix beta of which each column is the discriminant coefficient in each task, a list Sigma of covariance matrices for each task.

References

Tian, Y., Weng, H., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.

Examples

data_list <- data_generation(K = 5, outlier_K = 1, simulation_no = "MTL-1", h_w = 0.1,
h_mu = 1, n = 50)

[Package mtlgmm version 0.1.0 Index]