data.xllim {xLLiM}R Documentation

Simulated data to run examples of usage of gllim and sllim functions

Description

Matrix of simulated data, generated under a GLLiM model, with K=5 clusters from the true parameters available in object data.xllim.trueparameters. The goal is to learn the non linear relation between the responses (Y) and the covariates (X) using gllim, bllim or sllim. Details are given hereafter.

Usage

data(data.xllim)

Format

A matrix of simulated data with 52 rows and 100 columns (observations). The first 2 rows are responses (Y) and the last 50 rows are covariates (X). The goal is to retrieve YY from XX using gllim or sllim.

Details

This dataset is generated under a GLLiM model with L=2, D=50 and N=100.

First, the responses YY are generated according to a Gaussian Mixture model with K=5 clusters:

p(Y=yZ=k)=N(y;ck,Γk)p(Y=y | Z=k)= N(y; c_k,\Gamma_k)

where each (ck)k=1K(c_k)_{k=1}^K is a L-vector randomly sampled from a standardized Gaussian, (Γk)k=1K(\Gamma_k)_{k=1}^K are LxL random correlation matrix and ZZ is a multinomial hidden variable which indicates the cluster membership of each observation:

p(Z=k)=πkp(Z=k) =\pi_k

where the probabilities (πk)k=1K(\pi_k)_{k=1}^K are sampled from a standard uniform distribution and normalized to sum to 1.

Then, the covariates XX are generated according to a Gaussian Mixture of regressions. It is recalled that GLLiM models the following inverse relation, which is used to generate XX:

X=k=1K=5IZ=k(AkX+bk+Ek)X = \sum_{k=1}^{K=5} I_{Z=k}(A_kX+b_k+E_k)

where YY is the vector of L responses and XX is the vector of D covariates and ZZ is the hidden variable of cluster membership introduced above. Regression coefficients AkA_k and intercepts bkb_k are sampled from a standard Gaussian and the covariance matrix of the noise Σk=Var(Ek)\Sigma_k=Var(E_k) is the identity.

The goal is to retrieve YY from XX using gllim, bllim or sllim.

See Also

xLLiM-package, gllim, sllim, data.xllim.test

Examples

data(data.xllim)
dim(data.xllim) # 52 100
Y = data.xllim[1:2,] # responses # 2 100
X = data.xllim[3:52,] # covariates # 50 100

[Package xLLiM version 2.3 Index]