DisHet {DisHet}R Documentation

Heterogeneity Dissection

Description

This function performs dissection of bulk sample gene expression using matched normal and tumorgraft RNA-seq data. It outputs the final proportion estiamtes of the three components for all patients.

The patient-specific dissection proportion estimates are saved in a 3-by-k matrix named "rho", where k is the number of patients. The 3 rows of "rho" matrix correspond to the tumor, normal, stroma components in order. That is, the proportion estimate of tumor component for patient i is stored in rho[1,i]; the normal component proportion estimate of this patient is stored in rho[2,i];and stroma component proportion in rho[3,i].

Usage

DisHet(exp_T,exp_N,exp_G, save=TRUE, MCMC_folder, 
      n_cycle=10000, save_last=500, mean_last=200, dirichlet_c=1, S_c=1, rho_small=1e-2, 
      initial_rho_S=0.02,initial_rho_G=0.96,initial_rho_N=0.02)

Arguments

exp_T

Gene expression in bulk RNA-seq samples. The rows correspond to different genes. The columns correspond to different patients.

exp_N

Gene expression in the corresponding normal samples. The rows list the same set of genes as in exp_G. The columns correspond to patients matched with exp_T.

exp_G

Gene expression in the corresponding tumor samples. The rows list the same set of genes as in exp_G. The columns correspond to patients matched with exp_T.

save

When save==TRUE, as in default, all component proportion estimates during MCMC iterations can be saved into a user-specified directory using the "MCMC_folder" argument.

MCMC_folder

Directory for saving the estimated mixture proportion matrix updates during MCMC iterations. The default setting is to create a "DisHet" folder under the current working directory.

n_cycle

Number of MCMC iterations(chain length). The default value is 10,000.

save_last

Save the rho matrix updates for the last "save_last" Number of MCMC iterations. The default value is 500.

mean_last

Calculate the final proportion estiamte matrix using the last "mean_last" number of MCMC iterations. The default value is 200.

dirichlet_c

Stride scale in sampling rho. Larger value leads to smaller steps in sampling rho. The default value is 1.

S_c

Stride scale in sampling Sij. Larger value leads to larger steps in sampling Sij. The default value is 1.

rho_small

The smallest rho updates allowed during MCMC. The default is 1e-2. This threshold is set to help improve numerical stability of the algorithm.

initial_rho_S

Initial value of the proportion estimate for the stroma component. The default value is 0.02.

initial_rho_G

Initial value of the proportion estimate for the tumor component. The default value is 0.96.

initial_rho_N

Initial value of the proportion estimate for the normal component. The default value is 0.02.

Details

Un-logged expression values should be used in exp_N/T/G matrices, and their rows and columns must match each other corresponding to the same set of genes and patients.

The values specified for "initial_rho_S", "initial_rho_G", and "initial_rho_S" all have to be positive. If the three proportion initials are not summing to 1, normalization is performed automatically to force the sum to be 1.

Examples

  load(system.file("example/example_data.RData",package="DisHet"))
  exp_T <- exp_T[1:200,]
  exp_N <- exp_N[1:200,]
  exp_G <- exp_G[1:200,]
  
  rho <- DisHet(exp_T, exp_N, exp_G, save=FALSE, n_cycle=200, mean_last=50)

[Package DisHet version 1.0.0 Index]