preprocess_data {xLLiM} | R Documentation |
A proposition of function to process high dimensional data before running gllim, sllim or bllim
Description
The goal of preprocess_data()
is to get relevant clusters for G-, S-, or BLLiM initialization, coupled with a feature selection for high-dimensional datasets. This function is an alternative to the default initialization implemented in gllim()
, sllim()
and bllim()
.
In this function, clusters are initialized with K-means, and variable selection is performed with a LASSO (glmnet
) within each clusters. Then selected features are merged to get a subset variables before running any prediction method of xLLiM.
Usage
preprocess_data(tapp,yapp,in_K,...)
Arguments
tapp |
An |
yapp |
An |
in_K |
Initial number of components or number of clusters |
... |
Other arguments of glmnet can be passed |
Value
selected.variables |
Vector of the indexes of selected variables. Selection is made within clusters and merged hereafter. |
clusters |
Initialization clusters with k-means |
Author(s)
Emeline Perthame (emeline.perthame@pasteur.fr), Emilie Devijver (emilie.devijver@kuleuven.be), Melina Gallopin (melina.gallopin@u-psud.fr)
References
[1] E. Devijver, M. Gallopin, E. Perthame. Nonlinear network-based quantitative trait prediction from transcriptomic data. Submitted, 2017, available at https://arxiv.org/abs/1701.07899.
See Also
xLLiM-package
, glmnet-package
, kmeans
Examples
x <- 1