| dat {marble} | R Documentation |
simulated data for demonstrating the features of marble.
Description
Simulated gene expression data for demonstrating the features of marble.
Usage
data("dat")
Format
dat consists of four components: X, Y, E, clin.
Details
The data model for generating Y
Use subscript i to denote the ith subject. Let (Y_{i}, X_{i}, E_{i}, clin_{i}) (i=1,\ldots,n) be
independent and identically distributed random vectors. Y_{i} is a continuous response variable representing the
phenotype. X_{i} is the p–dimensional vector of genetic factors. The environmental factors and clinical factors
are denoted as the q-dimensional vector E_{i} and the m-dimensional vector clin_{i}, respectively.
The \epsilon follows some heavy-tailed distribution. For X_{ij} (j = 1,\ldots,p), the measurement of the jth genetic factor on the jth subject,
considering the following model:
Y_{i} = \alpha_{0} + \sum_{k=1}^{q}\alpha_{k}E_{ik}+\sum_{t=1}^{m}\gamma_{t}clin_{it}+\beta_{j}X_{ij}+\sum_{k=1}^{q}\eta_{jk}X_{ij}E_{ik}+\epsilon_{i},
where \alpha_{0} is the intercept, \alpha_{k}'s and \gamma_{t}'s are the regression coefficients corresponding to effects of environmental and clinical factors, respectively.
The \beta_{j}'s and \eta_{jk}'s are the regression coefficients of the genetic variants and G\timesE interactions effects, correspondingly.
The G\timesE interactions effects are defined with W_{j} = (X_{j}E_{1},\ldots,X_{j}E_{q}). With a slight abuse of notation, denote \tilde{W} = W_{j}.
Denote \alpha=(\alpha_{1}, \ldots, \alpha_{q})^{T}, \gamma=(\gamma_{1}, \ldots, \gamma_{m})^{T}, \beta=(\beta_{1}, \ldots, \beta_{p})^{T}, \eta=(\eta_{1}^{T}, \ldots, \eta_{p}^{T})^{T}, \tilde{W} = (\tilde{W_{1}}, \dots, \tilde{W_{p}}).
Then model can be written as
Y_{i} = E_{i}\alpha + clin_{i}\gamma + X_{ij}\beta_{j} + \tilde{W}_{i}\eta_{j} + \epsilon_{i}.
See Also
Examples
data(dat)
dim(X)