PreProcess {BVSNLP} | R Documentation |
This function preprocesses the design matrix by removing
columns that contain NA
's or are all zero. It also standardizes
non-binary columns to have mean zero and variance one. The user has the
choice of log transforming continuous covariates before scaling them.
PreProcess(X, logT = FALSE)
X |
The |
logT |
A boolean variable determining if log transform should be done on continuous columns before scaling them. Note that those columns should not contain any zeros or negative values. |
It returns a list having the following objects:
X |
The filtered design matrix which can be used in variable selection procedure. Binary columns are moved to the end of the design matrix. |
gnames |
Gene names read from the column names of the filtered design matrix. |
Amir Nikooienejad
### Constructing a synthetic design matrix for the purpose of preprocessing
### imposing columns with different scales
n <- 40
p1 <- 50
p2 <- 150
p <- p1 + p2
X1 <- matrix(rnorm(n*p1, 1, 2), ncol = p1)
X2 <- matrix(rnorm(n*p2), ncol = p2)
X <- cbind(X1, X2)
### putting NA elements in the matrix
X[3,85] <- NA
X[25,85] <- NA
X[35,43] <- NA
X[15,128] <- NA
colnames(X) <- paste("gene_",c(1:p),sep="")
### Running the function. Note the intercept column that is added as the
### first column in the "logistic" family
Xout <- PreProcess(X)
dim(Xout$X)[2] == (p + 1) ## 1 is added because intercept column is included
## This is FALSE because of the removal of columns with NA elements