R: Pre-process data

pre_proc_data {SparseMDC}

R Documentation

Pre-process data

Description

This function centers on a gene-by-gene basis, normalizes and/or log transforms the data prior to the application of SparseMDC.For the sequencing depth normalization we recommend that users use one of the many methods developed for normalizing scRNA-Seq data prior to using SparseMDC and so can set norm = FALSE. However, here we normalize the data by dividing by the total number of reads. This function log transforms the data by applying log(x + 1) to each of the data sets. By far the most important pre-processing step for SparseMDC is the centralization of the data. Having centralized data is a core component of the SparseMDC algorithm and is necessary for both accurate clustering of the cells and identifying marker genes. We therefore recommend that all users centralize their data using this function and that only experienced users set center = FALSE.

Usage

pre_proc_data(dat_l, dim, norm = FALSE, log = TRUE, center = TRUE)

Arguments

`dat_l`	list with D entries, each entry contains data d, p * n matrix. The entries should be ordered according the condition of each dataset. The rows of the data matrix should contain samples while the columns contain features or genes.
`dim`	Total number of conditions, D.
`norm`	True/False on if the data should be normalized. This parameter controls whether the data is normalized for sequencing depth by dividing each column by the total number of reads for that sample. We recommend that user use one of the many methods for normalizing scRNA-Seq data and so set this as `FALSE`. The default value is `FALSE`
`log`	True/False of if the data should be transformed as log(x + 1). The default value is `TRUE`.
`center`	This parameter controls whether the data is centered on a gene by gene basis. We recommend all users center their data prior to applying SparseMDC and only experienced users should set this as `FALSE`. The default value is `TRUE`.

Value

A list with D entries containing the pre-processed data.

Examples

set.seed(10)
# Select small dataset for example
data_test <- data_biase[1:100,]
# Split data into condition A and B
data_A <- data_test[ , which(condition_biase == "A")]
data_B <- data_test[ , which(condition_biase == "B")]
data_C <- data_test[ , which(condition_biase == "C")]
# Store data as list
dat_l <- list(data_A, data_B, data_C)
# Pre-process the data
pdat <- pre_proc_data(dat_l, dim=3, norm = FALSE, log = TRUE,
center = TRUE)

[Package SparseMDC version 0.99.5 Index]