simulateLD {VariableScreening} | R Documentation |
Simulate a dataset for testing the performance of screenlong
Description
Simulates a dataset that can be used to test the screenlong function, and to test the performance of the proposed method under different scenarios. The simulated dataset has two z-covariates and p x-covariates, only a few of which have nonzero effect. There are n subjects in the simulated dataset, each having J observations, which are not necessarily evenly timed, we randomly draw a subset to create an unbalanced dataset. The within-subject correlation is assumed to be AR-1.
Usage
simulateLD(
n = 100,
J = 10,
rho = 0.6,
p = 500,
trueIdx = c(5, 100, 200, 400),
beta0Fun = NULL,
betaFun = NULL,
gammaFun = NULL,
varFun = NULL
)
Arguments
n |
Number of subjects in the simulated dataset |
J |
Number of observations per subject |
rho |
The correlation parameter for the AR-1 correlation structure. |
p |
The total number of features to be screened from |
trueIdx |
The indexes for the active features in the simulated x matrix. This should be a vector, and the values should be a subset of 1:p. |
beta0Fun |
The time-varying intercept for the data-generating model, as a function of
time. If left as null, it will default to |
betaFun |
The time-varying coefficients for z in the data-generating model, as a
function of time. If left as null, it will be specified as two functions. The first is
|
gammaFun |
A list of functions of time, one function for each entry in trueIdx,
giving the time-varying effects of each active feature in the simulated x matrix.
If left as null, it will be specified as four functions. The first is a step function
|
varFun |
A function of time telling the marginal variance of the error function at a
given time. If left as null, it will be specified as |
Value
A list with following components: x Matrix of features to be screened. It will have n*J rows and p columns. y Vector of responses. It will have length of n*J. z A matrix representing covariates to be included in each of the screening models. The first column will be all ones, representing the intercept. The second will consist of random ones and zeros, representing simulated genders. id Vector of integers identifying the subject to which each observation belongs. time Vector of real numbers identifying observation times. It should have the same length as the number of rows of x.
Examples
set.seed(12345678)
results <- simulateLD(p=1000)