R: Generate multi-source data from partially linear models.

simdata.gen {matrans}

R Documentation

Generate multi-source data from partially linear models.

Description

Generate simulation datasets containing training data and testing data from partially linear models under various settings.

Usage

simdata.gen(
  px,
  num.source = 4,
  size,
  coeff0,
  coeff.mis,
  err.sigma,
  rho,
  size.test,
  sim.set = c("heter", "homo"),
  tar.spec = c("cor", "mis"),
  if.heter = FALSE
)

Arguments

`px`	the dimension of the shared parametric component for all models. Should be an integer smaller than sample size.
`num.source`	the number of datasets. Should be the value 4 or 7.
`size`	the sample size of different datasets. Should be a vector of `num.source`.
`coeff0`	a px * num.source matrix of the shared coefficient vector for all models.
`coeff.mis`	the shared coefficient vector for the misspecified model. If tar.spec = 'cor', it should be a parameter vector of length px + 1 for the second misspecified source model. If tar.spec = 'mis', it should be a (px+1) * 2 matrix, in which the first column is the parameter vector for the misspecified target model and the second column is for the second misspecified source model. The last component of predictors for the misspecified model will be omitted in the estimation.
`err.sigma`	the standard deviations of the normal random errors in regression models.
`rho`	the correlation coefficient in the multivariate normal distribution of the parametric variables.
`size.test`	the sample size of the testing target data.
`sim.set`	the type of the nonparametric settings. Can be "heter" or "homo", which represents the heterogeneous and homogeneous dimension settings, respectively.
`tar.spec`	the type of the target model specification. Can be "cor" or "mis", which represents the corrected and misspecified target model, respectively.
`if.heter`	the logical variable, whether to allow a heteroscedastic setup. Default is False.

Value

a list of the training data and testing data, including the response, parametric predictors, nonparametric predictors, nonparametric values, coefficient vector.

References

Hu, X., & Zhang, X. (2023). Optimal Parameter-Transfer Learning by Semiparametric Model Averaging. Journal of Machine Learning Research, 24(358), 1-53.

Examples

coeff0 <- cbind(
  as.matrix(c(1.4, -1.2, 1, -0.8, 0.65, 0.3)),
  as.matrix(c(1.4, -1.2, 1, -0.8, 0.65, 0.3) + 0.02),
  as.matrix(c(1.4, -1.2, 1, -0.8, 0.65, 0.3) + 0.3),
  as.matrix(c(1.4, -1.2, 1, -0.8, 0.65, 0.3))
)
# correct target model setting
whole.data <- simdata.gen(
  px = 6, num.source = 4, size = c(150, 200, 200, 150), coeff0 = coeff0,
  coeff.mis = as.matrix(c(coeff0[, 2], 1.8)), err.sigma = 0.5, rho = 0.5, size.test = 500,
  sim.set = "homo", tar.spec = "cor", if.heter = FALSE
)

# misspecified target model setting
coeff.mis <- matrix(c(c(coeff0[, 1], 0.1), c(coeff0[, 2], 1.8)), ncol = 2)
whole.data <- simdata.gen(
  px = 6, num.source = 4, size = c(150, 200, 200, 150), coeff0 = coeff0,
  coeff.mis = coeff.mis, err.sigma = 0.5, rho = 0.5, size.test = 500,
  sim.set = "homo", tar.spec = "mis", if.heter = FALSE
)

[Package matrans version 0.1.0 Index]