FilterGenes {grandR} | R Documentation |
Filter genes
Description
Return a grandR object with fewer genes than the given grandR object (usually to filter out weakly expressed genes).
Usage
FilterGenes(
data,
mode.slot = "count",
minval = 100,
mincol = ncol(data)/2,
min.cond = NULL,
use = NULL,
keep = NULL,
return.genes = FALSE
)
Arguments
data |
the grandR object |
mode.slot |
the mode.slot that is used for filtering (see details) |
minval |
the minimal value for retaining a gene |
mincol |
the minimal number of columns (i.e. samples or cells) a gene has to have a value >= minval |
min.cond |
if not NULL, do not compare values per column, but per condition (see details) |
use |
if not NULL, defines the genes directly that are supposed to be retained (see details) |
keep |
if not NULL, defines genes directly, that should be kept even though they do not adhere to the filtering criteria (see details) |
return.genes |
if TRUE, return the gene names instead of a new grandR object |
Details
By default genes are retained, if they have 100 read counts in at least half of the columns (i.e. samples or cells).
The use
parameter can be used to define genes to be retained directly. The keep
parameter, in contrast, defines
additional genes to be retained. For both, genes can be referred to by their names, symbols, row numbers in the gene table,
or a logical vector referring to the gene table rows.
To refer to data slots, the mode.slot syntax can be used: Each name is either a data slot, or one of (new,old,total) followed by a dot followed by a slot. For new or old, the data slot value is multiplied by ntr or 1-ntr. This can be used e.g. to filter by new counts.
if the min.cond
parameter is given, first all columns belonging to the same Condition
are summed up, and then the usual filtering
is performed by conditions instead of by columns.
Value
either a new grandR object (if return.genes=FALSE), or a vector containing the gene names that would be retained
Examples
sars <- ReadGRAND(system.file("extdata", "sars.tsv.gz", package = "grandR"),
design=c("Condition",Design$dur.4sU,Design$Replicate))
nrow(sars)
# This is already filtered and has 1045 genes
nrow(FilterGenes(sars,minval=1000))
# There are 966 genes with at least 1000 read counts in half of the samples
nrow(FilterGenes(sars,minval=10000,min.cond=1))
# There are 944 genes with at least 10000 read counts in the Mock or SARS condition
nrow(FilterGenes(sars,use=GeneInfo(sars,"Type")!="Cellular"))
# These are the 11 viral genes.