preprocess {plsgenomics} | R Documentation |
preprocess for microarray data
Description
The function preprocess
performs a preprocessing of microarray data.
Usage
preprocess(Xtrain, Xtest=NULL,Threshold=c(100,16000),Filtering=c(5,500),
log10.scale=TRUE,row.stand=TRUE)
Arguments
Xtrain |
a (ntrain x p) data matrix of predictors. |
Xtest |
a (ntest x p) matrix containing the predictors for the test data
set. |
Threshold |
a vector of length 2 containing the values (threshmin,threshmax) for
thresholding data in preprocess. Data is thresholded to value threshmin and ceiled to value
threshmax. If |
Filtering |
a vector of length 2 containing the values (FiltMin,FiltMax) for filtering genes
in preprocess. Genes with max/min$<= FiltMin$ and (max-min)$<= FiltMax$ are excluded.
If |
log10.scale |
a logical value equal to TRUE if a log10-transformation has to be done. |
row.stand |
a logical value equal to TRUE if a standardisation in row has to be done. |
Details
The pre-processing steps recommended by Dudoit et al. (2002) are performed. The default values are those adapted for Colon data.
Value
A list with the following components:
pXtrain |
the (ntrain x p') matrix containing the preprocessed train data. |
pXtest |
the (ntest x p') matrix containing the preprocessed test data. |
Author(s)
Sophie Lambert-Lacroix (http://membres-timc.imag.fr/Sophie.Lambert/) and Julie Peyre (https://membres-ljk.imag.fr/Julie.Peyre/).
References
Dudoit, S. and Fridlyand, J. and Speed, T. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data, Journal of the American Statistical Association, 97, 77–87.
Examples
# load plsgenomics library
library(plsgenomics)
# load Colon data
data(Colon)
IndexLearn <- c(sample(which(Colon$Y==2),27),sample(which(Colon$Y==1),14))
Xtrain <- Colon$X[IndexLearn,]
Ytrain <- Colon$Y[IndexLearn]
Xtest <- Colon$X[-IndexLearn,]
# preprocess data
resP <- preprocess(Xtrain= Xtrain, Xtest=Xtest,Threshold = c(100,16000),Filtering=c(5,500),
log10.scale=TRUE,row.stand=TRUE)
# how many genes after preprocess ?
dim(resP$pXtrain)[2]