filter_common_genes {tidyestimate} | R Documentation |
Remove non-common genes from data frame
Description
As ESTIMATE score calculation is sensitive to the number of genes used, a set
of common genes used between six platforms has been established (see
?tidyestimate::common_genes
). This function will filter for only those
genes.
Usage
filter_common_genes(
df,
id = c("entrezgene_id", "hgnc_symbol"),
tidy = FALSE,
tell_missing = TRUE,
find_alias = FALSE
)
Arguments
df |
a |
id |
either |
tidy |
logical. If rownames contain gene identifier, set |
tell_missing |
logical. If |
find_alias |
logical. If |
Details
The find_aliases
argument will attempt to find aliases for HGNC
symbols in tidyestimate::common_genes
but missing from the provided
dataset. This will only run if find_aliases = TRUE
and id =
"hgnc_symbol"
.
This algorithm is very conservative: It will only make a match if the gene from the common genes has only one alias that matches with only one gene from the provided dataset, and the gene from the provided dataset with which it matches only matches with a single gene from the list of common genes. (Note that a single gene may have many aliases). Once a match has been made, the gene in the provided dataset is updated to the gene name in the common gene list.
While this method is fairly accurate, is is also a heuristic. Therefore, it is disabled by default. Users should check which genes are becoming reassigned to ensure accuracy.
The method of generation of these aliases can be found at
?tidyestimate::common_genes
Value
A tibble
, with gene identifiers as the first column
Examples
filter_common_genes(ov, id = "hgnc_symbol", tidy = FALSE, tell_missing = TRUE, find_alias = FALSE)