dat {springer}R Documentation

simulated data for demonstrating the usage of springer

Description

Simulated gene expression data for demonstrating the usage of springer.

Usage

data("dat")

Format

The dat file consists of five components: e, g, y, clin and coeff. The coefficients are the true values of parameters used for generating Y.

Details

The data model for generating Y

Consider a longitudinal case study with nn subjects and kik_i measurements over time for the iith subject (i=1,,ni=1,\ldots,n). Let YijY_{ij} be the response of the jjth observation for the iith subject (i=1,,ni=1,\ldots,n, j=1,,kij=1,\ldots,k_i), Xij=(Xij1,...,Xijp)X_{ij}=(X_{ij1},...,X_{ijp})^\top be a pp-dimensional vector of covariates denoting pp genetic factors, Eij=(Eij1,...,Eijq)E_{ij}=(E_{ij1},...,E_{ijq})^\top be a qq-dimensional environmental factor and Clinij=(Clinij1,...,Clinijt)Clin_{ij}=(Clin_{ij1},...,Clin_{ijt})^\top be a tt-dimensional clinical factor. There is time dependence among measurements on the same subject, but we assume that the measurements between different subjects are independent. The model we used for hierarchical variable selection for gene–environment interactions is given as:

Yij=α0+m=1tθmClinijm+u=1qαuEiju+v=1p(γvXijv+u=1qhuvEijuXijv)+ϵij,Y_{ij}= \alpha_0 + \sum_{m=1}^{t}\theta_m Clin_{ijm} + \sum_{u=1}^{q}\alpha_u E_{iju} + \sum_{v=1}^{p}(\gamma_v X_{ijv} + \sum_{u=1}^{q}h_{uv} E_{iju} X_{ijv})+\epsilon_{ij},

where α0\alpha_{0} is the intercept and the marginal density of YijY_{ij} belongs to a canonical exponential family defined in Liang and Zeger (1986). Define ηv=(γv,h1v,...,hqv)\eta_v=(\gamma_v, h_{1v}, ..., h_{qv})^\top, which is a vector of length q+1 and Zijv=(Xijv,Eij1Xijv,...,EijqXijv)Z_{ijv}=(X_{ijv}, E_{ij1}X_{ijv}, ..., E_{ijq}X_{ijv})^\top, which contains the main genetic effect of the vvth SNP from the jjth measurement on the iith subject and its interactions with all the qq environmental factors. The model can be written as:

Yij=α0+m=1tθmClinijm+u=1qαuEiju+v=1pηvZijv+ϵij,Y_{ij}= \alpha_0 + \sum_{m=1}^{t}\theta_m Clin_{ijm} + \sum_{u=1}^{q}\alpha_u E_{iju} + \sum_{v=1}^{p}\eta_v^\top Z_{ijv}+\epsilon_{ij},

where ZijvZ_{ijv} is the vvth genetic factor and its interactions with the qq environment factors for the jjth measurement on the iith subject, and ηv\eta_{v} is the corresponding coefficient vector of length 1+q1+q. The random error ϵi=(ϵi1,...,ϵiki)T\epsilon_{i}=(\epsilon_{i1},...,\epsilon_{ik_i})^{T}, which is assumed to follow a multivariate normal distribution with Σi\Sigma_i as the covariance matrix for the repeated measurements of the ithith subject among the kik_i time points.

See Also

springer


[Package springer version 0.1.9 Index]