check_maf {plinkQC} | R Documentation |
Identification of SNPs with low minor allele frequency
Description
Runs and evaluates results from plink –freq. It calculates the minor allele
frequencies for all variants in the individuals that passed the
perIndividualQC
. The minor allele frequency distributions is
plotted as a histogram.
Usage
check_maf(
indir,
name,
qcdir = indir,
macTh = 20,
mafTh = NULL,
verbose = FALSE,
interactive = FALSE,
path2plink = NULL,
showPlinkOutput = TRUE,
keep_individuals = NULL,
remove_individuals = NULL,
exclude_markers = NULL,
extract_markers = NULL,
legend_text_size = 5,
legend_title_size = 7,
axis_text_size = 5,
axis_title_size = 7,
title_size = 9
)
Arguments
indir |
[character] /path/to/directory containing the basic PLINK data files name.bim, name.bed, name.fam files. |
name |
[character] Prefix of PLINK files, i.e. name.bed, name.bim, name.fam. |
qcdir |
[character] /path/to/directory where results will be written to.
If |
macTh |
[double] Threshold for minor allele cut cut-off, if both mafTh and macTh are specified, macTh is used (macTh = mafTh\*2\*NrSamples). |
mafTh |
[double] Threshold for minor allele frequency cut-off. |
verbose |
[logical] If TRUE, progress info is printed to standard out and specifically, if TRUE, plink log will be displayed. |
interactive |
[logical] Should plots be shown interactively? When choosing this option, make sure you have X-forwarding/graphical interface available for interactive plotting. Alternatively, set interactive=FALSE and save the returned plot object (p_hwe) via ggplot2::ggsave(p=p_maf, other_arguments) or pdf(outfile) print(p_maf) dev.off(). |
path2plink |
[character] Absolute path to PLINK executable
(https://www.cog-genomics.org/plink/1.9/) i.e.
plink should be accessible as path2plink -h. The full name of the executable
should be specified: for windows OS, this means path/plink.exe, for unix
platforms this is path/plink. If not provided, assumed that PATH set-up works
and PLINK will be found by |
showPlinkOutput |
[logical] If TRUE, plink log and error messages are printed to standard out. |
keep_individuals |
[character] Path to file with individuals to be retained in the analysis. The file has to be a space/tab-delimited text file with family IDs in the first column and within-family IDs in the second column. All samples not listed in this file will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#indiv. Default: NULL, i.e. no filtering on individuals. |
remove_individuals |
[character] Path to file with individuals to be removed from the analysis. The file has to be a space/tab-delimited text file with family IDs in the first column and within-family IDs in the second column. All samples listed in this file will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#indiv. Default: NULL, i.e. no filtering on individuals. |
exclude_markers |
[character] Path to file with makers to be removed from the analysis. The file has to be a text file with a list of variant IDs (usually one per line, but it's okay for them to just be separated by spaces). All listed variants will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#snp. Default: NULL, i.e. no filtering on markers. |
extract_markers |
[character] Path to file with makers to be included in the analysis. The file has to be a text file with a list of variant IDs (usually one per line, but it's okay for them to just be separated by spaces). All unlisted variants will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#snp. Default: NULL, i.e. no filtering on markers. |
legend_text_size |
[integer] Size for legend text. |
legend_title_size |
[integer] Size for legend title. |
axis_text_size |
[integer] Size for axis text. |
axis_title_size |
[integer] Size for axis title. |
title_size |
[integer] Size for plot title. |
Details
check_maf
uses plink –remove name.fail.IDs –freq to calculate the
minor allele frequencies for all variants in the individuals that passed the
perIndividualQC
. It does so without generating a new dataset
but simply removes the IDs when calculating the statistics.
For details on the output data.frame fail_maf, check the original description on the PLINK output format page: https://www.cog-genomics.org/plink/1.9/formats#frq.
Value
Named list with i) fail_maf containing a [data.frame] with CHR (Chromosome code), SNP (Variant identifier), A1 (Allele 1; usually minor), A2 (Allele 2; usually major), MAF (Allele 1 frequency), NCHROBS (Number of allele observations) for all SNPs that failed the mafTh/macTh and ii) p_maf, a ggplot2-object 'containing' the MAF distribution histogram which can be shown by (print(p_maf)).
Examples
indir <- system.file("extdata", package="plinkQC")
qcdir <- tempdir()
name <- "data"
path2plink <- '/path/to/plink'
# the following code is not run on package build, as the path2plink on the
# user system is not known.
## Not run:
# run on all individuals and markers
fail_maf <- check_maf(indir=indir, qcdir=qcdir, name=name, macTh=15,
interactive=FALSE, verbose=TRUE, path2plink=path2plink)
# run on subset of individuals and markers
keep_individuals_file <- system.file("extdata", "keep_individuals",
package="plinkQC")
exclude_markers_file <- system.file("extdata", "exclude_markers",
package="plinkQC")
fail_maf <- check_maf(qcdir=qcdir, indir=indir,
name=name, interactive=FALSE, verbose=TRUE, path2plink=path2plink,
keep_individuals=keep_individuals_file, exclude_markers=exclude_markers_file)
## End(Not run)