buildData {highDmean} | R Documentation |
Two-sample datasets generator
Description
This function generates simulated high dimensional two-sample data from user specified populations with given mean vectors, covariance structure, sample sizes, and dimension of each observation. It could generate the long-range dependent process proposed by Hall et al. (1998) in additional to some processes provided in arima.sim().
Usage
buildData(
n,
m,
p,
muX,
muY,
dep,
commoncov = TRUE,
VarScaleY = 1,
S = 1,
innov = function(n, ...) stats::rnorm(n, 0, 1),
heteroscedastic = FALSE,
het.diag
)
Arguments
n |
number of observations in the 1st sample. |
m |
number of observations in the 2nd sample. |
p |
the dimensionality of the each observation. The samples from both populations should have the same dimension. |
muX |
|
muY |
|
dep |
dependence structure among the 'IND' for independence; 'SD' for strong dependency, AR(1) with parameter 0.9; 'WD' for weak dependency, ARMA(2, 2) with AR parameters 0.4 and -0.1, and MA parameters 0.2 and 0.3; 'LR' for long-range dependency with parameter 0.7. For more details about the configurations, please refer to Zhang and Wang (2020). |
commoncov |
a logical indicating whether the two populations have equal covariance matrices. If FALSE, the innovations used in generating data for the 2nd population will be scaled by the square root of the value specified in VarScaleY. |
VarScaleY |
constant by which innovations are scaled in generating observations for the 2nd sample when commoncov=FALSE. |
S |
the number of data sets to simulate. |
innov |
a function used to generate the innovations, such as |
heteroscedastic |
a logical indicating whether the components will be scaled by the entries in the diagonal matrix specified by |
het.diag |
a |
Value
A list of S
lists, each consisting of an n
by p
matrix X
, an m
by p
matrix Y
, the sample sizes, n
and m
, for each population, and the dimensionality p
.
References
Hall, P., Jing, B.-Y., and Lahiri, S. N. (1998). On the sampling window method for long-range dependent data. Statistica Sinica, 8(4):1189-1204.
Examples
# Generate 3 two-sample datasets of dimensionality 300
# with sample sizes 45 for one sample & 60 for the other.
buildData(n = 45, m =60, p = 300,
muX = rep(0,300), muY = rep(0,300),
dep = 'IND', S = 3, innov = rnorm)