ccRemover {ccRemover} R Documentation

## Removes the effect of the cell-cycle

### Description

ccRemover returns a data matrix with the effects of the cell-cycle removed.

### Usage

ccRemover(dat, cutoff = 3, max_it = 4, nboot = 200, ntop = 10,
bar = TRUE)


### Arguments

 dat A list containing a data frame , x, that contains gene expression measurements with each column representing a sample and each row representing a gene and a logical vector, if_cc, that indicates which of the genes/rows are related to the cell-cycle or factor of interest. It is recommended that the elements of x are log-transformed and centered for each gene. For example if x contains TPM measurements then we suggest the following two-steps: Step 1: dat$x <- log(dat$x + 1) Step 2: dat$x - rowMeans(dat$x) ccRemover requires that the samples have been properly normalized for sequencing depth and we recommend doing so prior to applying the above steps. The if_cc vector must be the same length as the number of rows in x and have elements equal to TRUE for genes which are related to the cell-cycle and and elements equal to FALSE for genes which are unrelated to the cell-cycle. cutoff The significance cutoff for identifying sources of variation related to the cell-cycle. The default value is 3, which roughly corresponds to a p-value of 0.01. max_it The maximum number of iterations for the algorithm. The default value is 4. nboot The number of bootstrap repititions to be carried out on each iteration to determine the significance of each component. ntop The number of components considered tested at each iteration as cell-cycle effects. The default value if 10 bar Whether to display a progress bar or not. The progress bar will not work in R-markdown enviornments so this option may be turned off. The default value is TRUE.

### Details

Implements the algorithm described in Barron, M. & Li, J. "Identifying and removing the cell-cycle effect from scRNA-Sequencing data" (2016), Scientific Reports. This function takes a normalized, log-transformed and centered matrix of scRNA-seq expression data and a list of genes which are known to be related to the cell-cycle effect. It then captures the main sources of variation in the data and determines which of these are related to the cell-cycle before removing those that are. Please see the original manuscript for further details.

### Value

A data matrix with the effects of the cell-cycle removed.

### Examples

set.seed(10)
data(t.cell_data)
# Center data and select small sample for example
t_cell_data_cen <- t(scale(t(t.cell_data[,1:20]), center=TRUE, scale=FALSE))
# Extract gene names
gene_names <- rownames(t_cell_data_cen)
# Determine which genes are annotated to the cell-cycle
cell_cycle_gene_indices <- gene_indexer(gene_names,
species = "mouse", name_type = "symbol")
# Create "if_cc" vector
if_cc <- rep(FALSE,nrow(t_cell_data_cen))
if_cc[cell_cycle_gene_indices] <- TRUE
# Move data into list
dat <- list(x=t_cell_data_cen, if_cc=if_cc)
# Run ccRemover
## Not run:
xhat <- ccRemover(dat, cutoff = 3, max_it = 4, nboot = 200, ntop = 10)

## End(Not run)
# Run ccRemover with reduced bootstrap repetitions for example only
xhat <- ccRemover(dat, cutoff = 3, max_it = 4, nboot = 20, ntop = 10)