deseq_analysis {autoGO}R Documentation

Differential Gene Expression Analysis

Description

This function allows to perform a differential gene expression analysis using the DESeq2 package.

The principal DESeq2 workflow is employed. Raw counts are rounded, if it is necessary, because integers are needed for DESeq2 to run. In case the user provides an .rds file, the tool makes sure that in the assay of the SummarizedExperiment only counts are stored. Then the function DESeqDataSetFromMatrix (or DESeqDataSet for .rds) is employed. A prefiltering is applied in order to remove all genes having sum along the subjects less than 10. The Differential Expression Analysis is performed by employing the function DE and the normalized data matrix is stored in the working directory. This will be useful for further analysis and visualizations. Results are extracted for each comparison of interest. Subfolders will be generated in the "outfolder" (default: ./results) with the name of the comparisons made. NOTE: as standard we use the nomenclature "CONTROL_vs_TREATMENT", i.e. the control is on the left. Inside each comparison subfolder we will find a .tsv file with the complete differential analysis and other subfolder based on the pvalue and the log2FC thresholds; inside this we will find a .tsv file with the results of the only filtered genes. This subfolders will be divided in other subfolders as "up_genes", "down_genes" and "up_down_genes". Look the path flow chart at the end of this tutorial (Figure 1).

Usage

deseq_analysis(
  counts,
  groups,
  comparisons,
  padj_threshold = 0.05,
  log2FC_threshold = 0,
  pre_filtering = TRUE,
  save_excel = FALSE,
  outfolder = "./results",
  del_csv = ","
)

Arguments

counts

The path to raw counts file. Accepted file formats are tab or comma-separated files (.tsv, .csv), .txt files, .rds. Genes must be on rows, samples on columns.

groups

Sample information table needed by DESeq2 (e.g. 'colData'). A data frame with at least two columns: one for samples, one for a grouping variable (See examples).

comparisons

Table of comparisons based on the grouping variable in 'groups' table (See examples). It should be a data.frame with column 'treatment' and column 'control'. It is possible to provide the path to a .txt file.

padj_threshold

Threshold value for adjusted p-value significance (Defaults to 0.05).

log2FC_threshold

Threshold value for log2(Fold Change) for considering genes as differentially expressed.

pre_filtering

Removes genes which sum in the raw counts is less than 10 (Default = TRUE).

save_excel

Allows to save all the output tables in .xlsx format (Default = FALSE).

outfolder

The name to assign to the folder for output saving. (Default = "./results").

del_csv

Specify the delimiter of the .csv file, default is ",". This is because opening .csv files with Excel messes up the format and changes the delimiter in ";".

Value

No return value. Files will be produced as part of normal execution.

Examples

sample <- c("Pat_1", "Pat_2", "Pat_3", "Pat_4", "Pat_5", "Pat_6")
group <- c("CTRL", "CTRL", "TREAT_A", "TREAT_A", "TREAT_B", "TREAT_B")
groups <- data.frame(sample, group)
treatment <- c("TREAT_A", "TREAT_B", "TREAT_A")
control <- c("CTRL", "CTRL", "TREAT_B")
comparisons <- data.frame(treatment, control)
## Not run: 
deseq_analysis(counts,
  groups,
  comparisons,
  padj_threshold = 0.05,
  log2FC_threshold = 0,
  pre_filtering = T,
  save_excel = F,
  outfolder = "./results",
  del_csv = ","
)

## End(Not run)

[Package autoGO version 0.9.1 Index]