stat.wilcox {caRpools}R Documentation

Analysis: Analysis of pooled CRISPR screening data using a Wilcoxon Test



Within this approach, the read counts of all sgRNAs in one dataset are first normalized by the function set in the MIACCS file. By default, normalization is done by read count division with the dataset median. Then, the fold change of each population of sgRNAs for a gene is tested against the population of either the non-targeting controls or randomly picked sgRNAs, as defined by the random picks option within the MIACCS file, using a two-sided Mann-Whitney-U test. P-values are corrected for multiple testing using FDR.


stat.wilcox(untreated.list=list(NULL, NULL),treated.list=list(NULL, NULL),
namecolumn=1, fullmatchcolumn=2,normalize=TRUE,,
extractpattern=expression("^(.+?)_.+"), controls=NULL, control.picks=300, sorting=TRUE)



A list of data.frames of untreated, control samples. e.g. list(df.control1, df.control2)


A list of data.frames of treated samples. e.g. list(df.treated1, df.treated2)


In which the target names are located, e.g. namecolumn=1 for the first columns.


Column, in which readcounts are located, e.g. fullmatchcolumn=2 for the second column.


Datasets can be normalized by if normalize=TRUE.

The function used to normalize the datasets if normalize=TRUE. By default, normalization is done using the dataset median, but any other function e.g. mean, can be used in principle.


Regular Expression, used to extract the gene name from the sgRNA name. Please make sure that the gene name extracted is accesible by putting its regular expression in brackets (). The default value expression("^(.+?)_.+") will look for the gene name (.+?) in front of the separator _ and any character afterwards .+ e.g. gene1_anything .


DSS requires a set of non-targeting sgRNAs (negative controls) within the datasets. You can specify the arbitrary gene name for these controls using controls="".


Analysis output is by default sorted by gene name (sorting=FALSE). If desired, the output table can be sorted according to the p-value of the genes (sorting=TRUE).


If no non-targeting controls are present or set, wilcox will pick a randum number of sgRNAs from the data set as the alternative population. This is only used if 'controls=NULL'. *Default* 300 *Values* numeric


stat.wilcox return a data.frame, which can be visualized by plot.hitident. The data.frame has the following format:

untreated treated foldchange p.value
AAK1 2.061346 3.007924 1.351672 0.2966311
AATK 3.413357 5.129985 1.398695 0.1146190
ABI1 2.997385 4.384881 1.418959 0.1437962
ABL1 2.269906 2.874087 1.211499 0.3681327
ABL2 2.519391 4.539583 1.732575 0.6335575

For each gene, the foldchange as well as the p-value, derived by the Mann-Whitney U test against the non-targeting controls, are listed.




Jan Winter



data.wilcox = stat.wilcox(untreated.list = list(CONTROL1, CONTROL2),
  treated.list = list(TREAT1,TREAT2), namecolumn=1, fullmatchcolumn=2,
  normalize=TRUE,, sorting=FALSE, controls="random",

[Package caRpools version 0.83 Index]