preprocess_data {xLLiM}R Documentation

A proposition of function to process high dimensional data before running gllim, sllim or bllim

Description

The goal of preprocess_data() is to get relevant clusters for G-, S-, or BLLiM initialization, coupled with a feature selection for high-dimensional datasets. This function is an alternative to the default initialization implemented in gllim(), sllim() and bllim().

In this function, clusters are initialized with K-means, and variable selection is performed with a LASSO (glmnet) within each clusters. Then selected features are merged to get a subset variables before running any prediction method of xLLiM.

Usage

preprocess_data(tapp,yapp,in_K,...)

Arguments

tapp

An L x N matrix of training responses with variables in rows and subjects in columns

yapp

An D x N matrix of training covariates with variables in rows and subjects in columns

in_K

Initial number of components or number of clusters

...

Other arguments of glmnet can be passed

Value

selected.variables

Vector of the indexes of selected variables. Selection is made within clusters and merged hereafter.

clusters

Initialization clusters with k-means

Author(s)

Emeline Perthame (emeline.perthame@pasteur.fr), Emilie Devijver (emilie.devijver@kuleuven.be), Melina Gallopin (melina.gallopin@u-psud.fr)

References

[1] E. Devijver, M. Gallopin, E. Perthame. Nonlinear network-based quantitative trait prediction from transcriptomic data. Submitted, 2017, available at https://arxiv.org/abs/1701.07899.

See Also

xLLiM-package, glmnet-package, kmeans

Examples

x <- 1

[Package xLLiM version 2.3 Index]