filter.adjustedCADD {Ravages}R Documentation

Variant filtering based on frequency and median adjusted CADD by CADD regions


Filter rare variants based on a MAF threshold, a given number of SNP or a given cumulative MAF per genomic region and the median of adjusted CADD score for each CADD region


filter.adjustedCADD(x, SNVs.scores = NULL, indels.scores = NULL,
                    ref.level = NULL, 
                    filter=c("whole", "controls", "any"), 
                    maf.threshold=0.01, min.nb.snps = 2, 
                    min.cumulative.maf = NULL, 
                    group = NULL, cores = 10,, verbose = T)



A bed.matrix annotated with CADD regions using set.CADDregions


A dataframe containing the ADJUSTED CADD scores of the SNVs (Optional, useful to gain in computation time if the adjusted CADD scores of variants in the study are available)


A dataframe containing the CADD PHREDv1.4 scores of the indels - Compulsory if indels are present in x


The level corresponding to the controls group, only needed if filter=="controls"


On which group the filter will be applied


The MAF threshold used to define a rare variant, set at 0.01 by default


The minimum number of variants needed to keep a CADD region, set at 2 by default


The minimum cumulative maf of variants needed to keep a CADD region


A factor indicating the group of each individual, only needed if filter = "controls" or filter = "any". If missing, x@ped$pheno is taken


How many cores to use, set at 10 by default

The repository where data for RAVA-FIRST are or will be downloaded from


Whether to display information about the function actions


Variants are directly annotated with the adjusted CADD scores in the function using the file "AdjustedCADD_v1.4_202108.tsv.gz" downloaded from in the repository of the package Ravages or the scores of variants can be provided to variant.scores to gain in computation time (this file should contain 5 columns: the chromosome ('chr'), position ('pos'), reference allele ('A1'), alternative allele ('A2') and adjusted CADD scores ('adjCADD'). As CADD scores are only available for SNVs, only those ones will be kept in the analysis.

If a column 'adjCADD' is already present in x@snps, no annotation will be performed and filtering will be directly on this column.

To use this function, a factor 'genomic.region' corresponding to the CADD regions and a vector 'adjCADD.Median' should be present in the slot x@snps. To obtain those two, use the function set.CADDregions.

Only variants with an adjusted CADD score upper than the median value are kept in the analysis. It is the filtering strategy applied in the RAVA.FIRST() pipeline.

If filter="whole", only the variants having a MAF lower than the threshold in the entire sample are kept.

If filter="controls", only the variants having a MAF lower than the threshold in the controls group are kept.

If filter="any", only the variants having a MAF lower than the threshold in any of the groups are kept.

It is recommended to use this function chromosome by chromosome for large datasets.


A bed.matrix with filtered variants


See Also

RAVA.FIRST, set.CADDregions, burden.subscores, filter.rare.variants


#Import 1000Genome data from region around LCT gene
#x <- as.bed.matrix(LCT.gen, LCT.fam, LCT.bim)

#Group variants within CADD regions and genomic categories
#x <- set.CADDregions(x)

#Annotate variants with adjusted CADD score
#and filter on frequency and median
#x.median <- filter.adjustedCADD(x, maf.threshold = 0.025, 
#                                min.nb.snps = 2)

[Package Ravages version 1.1.3 Index]