pre_proc_data {SparseMDC} | R Documentation |
Pre-process data
Description
This function centers on a gene-by-gene basis, normalizes and/or log
transforms the data prior to the application of SparseMDC.For the
sequencing depth normalization we recommend that users use one of the many
methods developed for normalizing scRNA-Seq data prior to using SparseMDC and
so can set norm = FALSE
. However, here we normalize the data by
dividing by the total number of reads. This function log transforms the data
by applying log(x + 1)
to each of the data sets. By far the most
important pre-processing step for SparseMDC is the centralization of the
data. Having centralized data is a core component of the SparseMDC algorithm
and is necessary for both accurate clustering of the cells and identifying
marker genes. We therefore recommend that all users centralize their data
using this function and that only experienced users set center = FALSE
.
Usage
pre_proc_data(dat_l, dim, norm = FALSE, log = TRUE, center = TRUE)
Arguments
dat_l |
list with D entries, each entry contains data d, p * n matrix. The entries should be ordered according the condition of each dataset. The rows of the data matrix should contain samples while the columns contain features or genes. |
dim |
Total number of conditions, D. |
norm |
True/False on if the data should be normalized. This parameter
controls whether the data is normalized for sequencing depth by dividing
each column by the total number of reads for that sample. We recommend that
user use one of the many methods for normalizing scRNA-Seq data and so set
this as |
log |
True/False of if the data should be transformed as log(x + 1).
The default value is |
center |
This parameter controls whether the data is centered on a gene
by gene basis. We recommend all users center their data prior to applying
SparseMDC and only experienced users should set this as |
Value
A list with D entries containing the pre-processed data.
Examples
set.seed(10)
# Select small dataset for example
data_test <- data_biase[1:100,]
# Split data into condition A and B
data_A <- data_test[ , which(condition_biase == "A")]
data_B <- data_test[ , which(condition_biase == "B")]
data_C <- data_test[ , which(condition_biase == "C")]
# Store data as list
dat_l <- list(data_A, data_B, data_C)
# Pre-process the data
pdat <- pre_proc_data(dat_l, dim=3, norm = FALSE, log = TRUE,
center = TRUE)