buildData {highDmean}R Documentation

Two-sample datasets generator

Description

This function generates simulated high dimensional two-sample data from user specified populations with given mean vectors, covariance structure, sample sizes, and dimension of each observation. It could generate the long-range dependent process proposed by Hall et al. (1998) in additional to some processes provided in arima.sim().

Usage

buildData(
  n,
  m,
  p,
  muX,
  muY,
  dep,
  commoncov = TRUE,
  VarScaleY = 1,
  S = 1,
  innov = function(n, ...) stats::rnorm(n, 0, 1),
  heteroscedastic = FALSE,
  het.diag
)

Arguments

n

number of observations in the 1st sample.

m

number of observations in the 2nd sample.

p

the dimensionality of the each observation. The samples from both populations should have the same dimension.

muX

p by 1 vector of component means for the 1st population.

muY

p by 1 vector of component means for the 2nd population.

dep

dependence structure among the p components for both populations. Possible options are:

'IND' for independence;

'SD' for strong dependency, AR(1) with parameter 0.9;

'WD' for weak dependency, ARMA(2, 2) with AR parameters 0.4 and -0.1, and MA parameters 0.2 and 0.3;

'LR' for long-range dependency with parameter 0.7.

For more details about the configurations, please refer to Zhang and Wang (2020).

commoncov

a logical indicating whether the two populations have equal covariance matrices. If FALSE, the innovations used in generating data for the 2nd population will be scaled by the square root of the value specified in VarScaleY.

VarScaleY

constant by which innovations are scaled in generating observations for the 2nd sample when commoncov=FALSE.

S

the number of data sets to simulate.

innov

a function used to generate the innovations, such as innov=function(n,...) rnorm(n,0,1).

heteroscedastic

a logical indicating whether the components will be scaled by the entries in the diagonal matrix specified by het.diag.

het.diag

a p by p diagonal matrix, where the entries on the diagonal will be used to scale the component standard deviations. Only valid when heteroscedastic = TRUE.

Value

A list of S lists, each consisting of an n by p matrix X, an m by p matrix Y, the sample sizes, n and m, for each population, and the dimensionality p.

References

Hall, P., Jing, B.-Y., and Lahiri, S. N. (1998). On the sampling window method for long-range dependent data. Statistica Sinica, 8(4):1189-1204.

Examples

# Generate 3 two-sample datasets of dimensionality 300
# with sample sizes 45 for one sample & 60 for the other.
buildData(n = 45, m =60, p = 300,
          muX = rep(0,300), muY = rep(0,300),
          dep = 'IND', S = 3, innov = rnorm)

[Package highDmean version 0.1.0 Index]