scMappR_and_pathway_analysis {scMappR} | R Documentation |
Generate cellWeighted_Foldchanges, visualize, and enrich.
Description
This function generates cell weighted Fold-changes (cellWeighted_Foldchange), visualizes them in a heatmap, and completes pathway enrichment of cellWeighted_Foldchanges and the bulk gene list using g:ProfileR.
Usage
scMappR_and_pathway_analysis(
count_file,
signature_matrix,
DEG_list,
case_grep,
control_grep,
rda_path = "",
max_proportion_change = -9,
print_plots = T,
plot_names = "scMappR",
theSpecies = "human",
output_directory = "scMappR_analysis",
sig_matrix_size = 3000,
drop_unknown_celltype = TRUE,
internet = TRUE,
up_and_downregulated = FALSE,
gene_label_size = 0.4,
number_genes = -9,
toSave = FALSE,
newGprofiler = TRUE,
path = NULL,
deconMethod = "DeconRNASeq",
rareCT_filter = TRUE
)
Arguments
count_file |
Normalized (i.e. TPM, RPKM, CPM) RNA-seq count matrix where rows are gene symbols and columns are individuals. Inputted data should be a data.frame or matrix. A character vector to a tsv file where this data can be loaded is also acceptable. Gene symbols from the count file, signature matrix, and DEG list should all match (case sensitive, gene symbol or ensembl, etc.) |
signature_matrix |
Signature matrix: a gene by cell-type matrix populated with the fold-change of gene expression in cell-type marker "i" vs all other cell-types. Object should be a data.frame or matrix. |
DEG_list |
An object with the first column as gene symbols within the bulk dataset (doesn't have to be in signature matrix), second column is the adjusted p-value, and the third the log2FC path to a .tsv file containing this info is also acceptable. |
case_grep |
A character representing what designates the "cases" (i.e. upregulated is 'case' biased) in the columns of the count file. A numeric vector of the index of "cases" is also acceptable. Tag in the column name for cases (i.e. samples representing upregulated) OR an index of cases. |
control_grep |
A character representing what designates the "control" (i.e. downregulated is 'control biased) in the columns of the count file. A numeric vector of the index of "control" is also acceptable. Tag in the column name for cases (i.e. samples representing upregulated) OR an index of cases. |
rda_path |
If downloaded, path to where data from scMappR_data is stored. |
max_proportion_change |
Maximum cell-type proportion change – may be useful if there are many rare cell-type. Alternatively, if a cell-type is only present in one condition but not the other, it will prevent possible infinite or 0 cwFold-changes. |
print_plots |
Whether boxplots of the estimated CT proportion for the leave-one-out method of CT deconvolution should be printed. The same name of the plots will be completed for top pathways. |
plot_names |
The prefix of plot pdf files. |
theSpecies |
human, mouse, or a species directly compatible with gProfileR (i.e. g:ProfileR). |
output_directory |
The name of the directory that will contain output of the analysis. |
sig_matrix_size |
Maximum number of genes in signature matrix for cell-type deconvolution. |
drop_unknown_celltype |
Whether or not to remove "unknown" cell-types from the signature matrix. |
internet |
Whether you have stable Wifi (T/F). |
up_and_downregulated |
Whether you are additionally splitting up/downregulated genes (T/F). |
gene_label_size |
The size of the gene label on the plot. |
number_genes |
The number of genes to cut-off for pathway analysis (good with many DEGs). |
toSave |
Allow scMappR to write files in the current directory (T/F). |
newGprofiler |
Whether to use gProfileR or gprofiler2 (T/F). |
path |
If toSave == TRUE, path to the directory where files will be saved. |
deconMethod |
Which RNA-seq deconvolution method to use to estimate cell-type proporitons. Options are "WGCNA", "DCQ", or "DeconRNAseq" |
rareCT_filter |
option to keep cell-types rarer than 0.1 percent of the population (T/F). Setting to FALSE may lead to false-positives. |
Details
This function generates cellWeighted_Foldchanges for every cell-type (see deconvolute_and_contextualize), as well as accompanying data such as cell-type proportions with the DeconRNA-seq, WGCNA, or DCQ methods. Then, it generates heatmaps of all cellWeighted_Foldchanges, cellWeighted_Foldchanges overlapping with the signature matrix, the entire signature matrix, the cell-type preference values from the signature matrix that overlap with inputted differentially expressed genes. Then, assuming there is available internet, it will complete gProfileR of the reordered cellWeighted_Foldchanges as well as a the ordered list of genes. This function is a wrapper for deconvolute_and_contextualize and pathway_enrich_internal and the primary function within the package.
Value
List with the following elements:
cellWeighted_Foldchanges |
Cellweighted Fold-changes for all differentially expressed genes. |
paths |
Enriched biological pathways for each cell-type. |
TFs |
Enriched TFs for each cell-type. |
Examples
data(PBMC_example)
bulk_DE_cors <- PBMC_example$bulk_DE_cors
bulk_normalized <- PBMC_example$bulk_normalized
odds_ratio_in <- PBMC_example$odds_ratio_in
case_grep <- "_female"
control_grep <- "_male"
max_proportion_change <- 10
print_plots <- FALSE
theSpecies <- "human"
toOut <- scMappR_and_pathway_analysis(count_file = bulk_normalized,
signature_matrix = odds_ratio_in,
DEG_list = bulk_DE_cors, case_grep = case_grep,
control_grep = control_grep, rda_path = "",
max_proportion_change = 10, print_plots = TRUE,
plot_names = "tst1", theSpecies = "human",
output_directory = "tester",
sig_matrix_size = 3000,
up_and_downregulated = FALSE,
internet = FALSE)