data.xllim {xLLiM}R Documentation

Simulated data to run examples of usage of gllim and sllim functions


Matrix of simulated data, generated under a GLLiM model, with K=5 clusters from the true parameters available in object data.xllim.trueparameters. The goal is to learn the non linear relation between the responses (Y) and the covariates (X) using gllim, bllim or sllim. Details are given hereafter.




A matrix of simulated data with 52 rows and 100 columns (observations). The first 2 rows are responses (Y) and the last 50 rows are covariates (X). The goal is to retrieve Y from X using gllim or sllim.


This dataset is generated under a GLLiM model with L=2, D=50 and N=100.

First, the responses Y are generated according to a Gaussian Mixture model with K=5 clusters:

p(Y=y | Z=k)= N(y; c_k,\Gamma_k)

where each (c_k)_{k=1}^K is a L-vector randomly sampled from a standardized Gaussian, (\Gamma_k)_{k=1}^K are LxL random correlation matrix and Z is a multinomial hidden variable which indicates the cluster membership of each observation:

p(Z=k) =\pi_k

where the probabilities (\pi_k)_{k=1}^K are sampled from a standard uniform distribution and normalized to sum to 1.

Then, the covariates X are generated according to a Gaussian Mixture of regressions. It is recalled that GLLiM models the following inverse relation, which is used to generate X:

X = \sum_{k=1}^{K=5} I_{Z=k}(A_kX+b_k+E_k)

where Y is the vector of L responses and X is the vector of D covariates and Z is the hidden variable of cluster membership introduced above. Regression coefficients A_k and intercepts b_k are sampled from a standard Gaussian and the covariance matrix of the noise \Sigma_k=Var(E_k) is the identity.

The goal is to retrieve Y from X using gllim, bllim or sllim.

See Also

xLLiM-package, gllim, sllim, data.xllim.test


dim(data.xllim) # 52 100
Y = data.xllim[1:2,] # responses # 2 100
X = data.xllim[3:52,] # covariates # 50 100

[Package xLLiM version 2.3 Index]