data.xllim {xLLiM} | R Documentation |
Simulated data to run examples of usage of gllim
and sllim
functions
Description
Matrix of simulated data, generated under a GLLiM model, with K=5 clusters from the true parameters available in object data.xllim.trueparameters
. The goal is to learn the non linear relation between the responses (Y) and the covariates (X) using gllim
, bllim
or sllim
. Details are given hereafter.
Usage
data(data.xllim)
Format
A matrix of simulated data with 52 rows and 100 columns (observations). The first 2 rows are responses (Y) and the last 50 rows are covariates (X). The goal is to retrieve Y
from X
using gllim
or sllim
.
Details
This dataset is generated under a GLLiM model with L=2, D=50 and N=100.
First, the responses Y
are generated according to a Gaussian Mixture model with K=5 clusters:
p(Y=y | Z=k)= N(y; c_k,\Gamma_k)
where each (c_k)_{k=1}^K
is a L-vector randomly sampled from a standardized Gaussian, (\Gamma_k)_{k=1}^K
are LxL random correlation matrix and Z
is a multinomial hidden variable which indicates the cluster membership of each observation:
p(Z=k) =\pi_k
where the probabilities (\pi_k)_{k=1}^K
are sampled from a standard uniform distribution and normalized to sum to 1.
Then, the covariates X
are generated according to a Gaussian Mixture of regressions. It is recalled that GLLiM models the following inverse relation, which is used to generate X
:
X = \sum_{k=1}^{K=5} I_{Z=k}(A_kX+b_k+E_k)
where Y
is the vector of L responses and X
is the vector of D covariates and Z
is the hidden variable of cluster membership introduced above. Regression coefficients A_k
and intercepts b_k
are sampled from a standard Gaussian and the covariance matrix of the noise \Sigma_k=Var(E_k)
is the identity.
The goal is to retrieve Y
from X
using gllim
, bllim
or sllim
.
See Also
xLLiM-package
, gllim
, sllim
, data.xllim.test
Examples
data(data.xllim)
dim(data.xllim) # 52 100
Y = data.xllim[1:2,] # responses # 2 100
X = data.xllim[3:52,] # covariates # 50 100