CDSeq {CDSeq}R Documentation

Complete deconvolution using sequencing data.

Description

CDSeq takes bulk RNA-seq data as input and simultaneously returns estimates of both cell-type-specific gene expression profiles and sample-specific cell-type proportions.

Usage

CDSeq(
  bulk_data,
  beta = 0.5,
  alpha = 5,
  cell_type_number = NULL,
  mcmc_iterations = 700,
  dilution_factor = 1,
  gene_subset_size = NULL,
  block_number = 1,
  cpu_number = NULL,
  gene_length = NULL,
  reference_gep = NULL,
  verbose = FALSE,
  print_progress_msg_to_file = 0
)

Arguments

bulk_data

RNA-Seq read counts matrix. Columns represent samples and rows represent genes.

beta

beta is a scalar or a vector of length G where G is the number of genes; default value for beta is 0.5; When beta=Null, CDSeq uses reference_gep to estimate beta.

alpha

alpha is a scalar or a vector of length cell_type_number where cell_type_number is the number of cell type; default value for alpha is 5.

cell_type_number

number of cell types. cell_type_number can be an integer or a vector of different integers. To estimate the number of cell types, please provide a vector for cell_type_number, e.g. cell_type_number <- 2:30, then CDSeq will estimate the number of cell types.

mcmc_iterations

number of iterations for the Gibbs sampler; default value is 700.

dilution_factor

a scalar to dilute the read counts for speeding up; default value is 1. CDSeq will use bulk_data/dilution_factor.

gene_subset_size

number of genes randomly sampled for each block. Default is NULL.

block_number

number of blocks. Each block contains gene_subset_size genes. Default is 1.

cpu_number

number of cpu cores that can be used for parallel computing; Default is NULL and CDSeq will detect the available number of cores on the device and use number of all cores - 1 for parallel computing.

gene_length

a vector of the effective length (gene length - read length + 1) of each gene; Default is NULL.

reference_gep

a reference gene expression profile can be used to determine the cell type and/or estimate beta; Default is NULL.

verbose

if TRUE, then print progress message to the console. Default is FALSE.

print_progress_msg_to_file

print progress message to a text file. Set 1 if need to print progress msg to a file and set 0 if no printing. Default is 0;

Value

CDSeq returns estimates of both cell-type-specific gene expression profiles and sample-specific cell-type proportions. CDSeq will also return estimated number of cell types. and the log posterior values for different number of cell types.

Examples

result1<-CDSeq(bulk_data =  mixtureGEP, cell_type_number = 6, mcmc_iterations = 5, 
       dilution_factor = 50, block_number = 1, gene_length = as.vector(gene_length), 
       reference_gep = refGEP, cpu_number=1, print_progress_msg_to_file=0)

[Package CDSeq version 1.0.8 Index]