simdata.gen {matrans}R Documentation

Generate multi-source data from partially linear models.

Description

Generate simulation datasets containing training data and testing data from partially linear models under various settings.

Usage

simdata.gen(
  px,
  num.source = 4,
  size,
  coeff0,
  coeff.mis,
  err.sigma,
  rho,
  size.test,
  sim.set = c("heter", "homo"),
  tar.spec = c("cor", "mis"),
  if.heter = FALSE
)

Arguments

px

the dimension of the shared parametric component for all models. Should be an integer smaller than sample size.

num.source

the number of datasets. Should be the value 4 or 7.

size

the sample size of different datasets. Should be a vector of num.source.

coeff0

a px * num.source matrix of the shared coefficient vector for all models.

coeff.mis

the shared coefficient vector for the misspecified model. If tar.spec = 'cor', it should be a parameter vector of length px + 1 for the second misspecified source model. If tar.spec = 'mis', it should be a (px+1) * 2 matrix, in which the first column is the parameter vector for the misspecified target model and the second column is for the second misspecified source model. The last component of predictors for the misspecified model will be omitted in the estimation.

err.sigma

the standard deviations of the normal random errors in regression models.

rho

the correlation coefficient in the multivariate normal distribution of the parametric variables.

size.test

the sample size of the testing target data.

sim.set

the type of the nonparametric settings. Can be "heter" or "homo", which represents the heterogeneous and homogeneous dimension settings, respectively.

tar.spec

the type of the target model specification. Can be "cor" or "mis", which represents the corrected and misspecified target model, respectively.

if.heter

the logical variable, whether to allow a heteroscedastic setup. Default is False.

Value

a list of the training data and testing data, including the response, parametric predictors, nonparametric predictors, nonparametric values, coefficient vector.

References

Hu, X., & Zhang, X. (2023). Optimal Parameter-Transfer Learning by Semiparametric Model Averaging. Journal of Machine Learning Research, 24(358), 1-53.

Examples

coeff0 <- cbind(
  as.matrix(c(1.4, -1.2, 1, -0.8, 0.65, 0.3)),
  as.matrix(c(1.4, -1.2, 1, -0.8, 0.65, 0.3) + 0.02),
  as.matrix(c(1.4, -1.2, 1, -0.8, 0.65, 0.3) + 0.3),
  as.matrix(c(1.4, -1.2, 1, -0.8, 0.65, 0.3))
)
# correct target model setting
whole.data <- simdata.gen(
  px = 6, num.source = 4, size = c(150, 200, 200, 150), coeff0 = coeff0,
  coeff.mis = as.matrix(c(coeff0[, 2], 1.8)), err.sigma = 0.5, rho = 0.5, size.test = 500,
  sim.set = "homo", tar.spec = "cor", if.heter = FALSE
)

# misspecified target model setting
coeff.mis <- matrix(c(c(coeff0[, 1], 0.1), c(coeff0[, 2], 1.8)), ncol = 2)
whole.data <- simdata.gen(
  px = 6, num.source = 4, size = c(150, 200, 200, 150), coeff0 = coeff0,
  coeff.mis = coeff.mis, err.sigma = 0.5, rho = 0.5, size.test = 500,
  sim.set = "homo", tar.spec = "mis", if.heter = FALSE
)

[Package matrans version 0.1.0 Index]