stat.DESeq {caRpools}R Documentation

Analysis: DESeq2 Analysis of pooled CRISPR NGS data

Description

For the DESeq2 analysis implementation, the read counts of all sgRNAs for a given gene are first summed up to increase the available read count. Then, DESeq2 analysis is perfomed, which includes the estimation of size-factors, the variance stabilization using a parametric fit and a Wald-Test for differnece in log2 fold changes between the untreated and treated data. More information about this can be found in _Love et al._ [Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2](http://www.ncbi.nlm.nih.gov/pubmed/25516281) _Genome Biology_ 2014

Usage

stat.DESeq(untreated.list,treated.list,namecolumn=1, fullmatchcolumn=2,
agg.function=sum, extractpattern=expression("^(.+?)_.+"), sorting=FALSE,
sgRNA.pval = 0.01, filename.deseq="data", fitType="parametric", p.adjust="holm")

Arguments

untreated.list

A list of data.frames of untreated, control samples. e.g. list(df.control1, df.control2)

treated.list

A list of data.frames of treated samples. e.g. list(df.treated1, df.treated2)

namecolumn

In which the target names are located, e.g. namecolumn=1 for the first columns.

fullmatchcolumn

Column, in which readcounts are located, e.g. fullmatchcolumn=2 for the second column.

agg.function

Function used to aggregate gene data from individual sgRNA data. By default, agg.function=mean, but it can be any other function e.g. sum or median.

extractpattern

Regular Expression, used to extract the gene name from the sgRNA name. Please make sure that the gene name extracted is accesible by putting its regular expression in brackets (). The default value expression("^(.+?)_.+") will look for the gene name (.+?) in front of the separator _ and any character afterwards .+ e.g. gene1_anything .

sorting

Defines whether the final output is sorted by the calculated p-value. By default, sorting=FALSE will return a table sorted by gene name.

sgRNA.pval

p-value threshold to count significant sgRNAs for each gene. *Default* 0.001 *Value* (numeric)

filename.deseq

Filename of raw DESeq2 data output. *Default* "data" *Values* (character)

fitType

See '?DESeq2'. *Default* "parametric" *Values* "parametric", "local" "mean"

p.adjust

Method to adjust p-value for multiple testing. See '?DEseq2'. *Default* "holm" *Values* see '?DESeq2'

Details

none

Value

stat.DESeq returns a formal class that contains gene names including the calculated p-value. The returned class can be visualized using carpools.hitident (see ?carpools.hitident). The output is formatted as follows:

log2 fold change (MAP): condition untreated vs treated
Wald test p-value: condition untreated vs treated
DataFrame with 813 rows and 6 columns

baseMean log2FoldChange lfcSE stat pvalue padj
AAK1 73.90565 -0.23319491 0.2927459 -0.7965779 0.42569619 0.7018234
AATK 159.43350 -0.11312924 0.2740927 -0.4127408 0.67979655 0.8514905
ABI1 131.03013 -0.09915855 0.2693971 -0.3680758 0.71281670 0.8691949
ABL1 77.51711 0.07837768 0.3155477 0.2483862 0.80383562 0.9114121
ABL2 119.22621 -0.49412039 0.2846396 -1.7359507 0.08257254 0.3128525
... ... ... ... ... ... ...

Note

none

Author(s)

Jan Winter, DESEq2 was developed by the Wolfgang Huber lab (EMBL, Heidelberg)

Examples


data(caRpools)
data.deseq = stat.DESeq(untreated.list = list(CONTROL1, CONTROL2),
  treated.list = list(TREAT1,TREAT2), namecolumn=1,
  fullmatchcolumn=2, extractpattern=expression("^(.+?)(_.+)"),
  sorting=FALSE, filename.deseq = "ANALYSIS-DESeq2-sgRNA.tab",
  fitType="parametric")
  
knitr::kable(data.deseq$genes[1:10,])


[Package caRpools version 0.83 Index]