perMarkerQC {plinkQC} | R Documentation |
Quality control for all markers in plink-dataset
Description
perMarkerQC checks the markers in the plink dataset for their missingness
rates across samples, their deviation from Hardy-Weinberg-Equilibrium (HWE)
and their minor allele frequencies (MAF). Per default, it assumes that IDs of
individuals that have failed perIndividualQC
have been written
to qcdir/name.fail.IDs and removes these individuals when computing
missingness rates, HWE p-values and MAF. If the qcdir/name.fail.IDs file does
not exist, a message is written to stdout but the analyses will continue for
all samples in the name.fam/name.bed/name.bim dataset.
Depicts i) SNP missingness rates (stratified by minor allele
frequency) as histograms, ii) p-values of HWE exact test (stratified by all
and low p-values) as histograms and iii) the minor allele frequency
distribution as a histogram.
Usage
perMarkerQC(
indir,
qcdir = indir,
name,
do.check_snp_missingness = TRUE,
lmissTh = 0.01,
do.check_hwe = TRUE,
hweTh = 1e-05,
do.check_maf = TRUE,
macTh = 20,
mafTh = NULL,
interactive = FALSE,
verbose = TRUE,
keep_individuals = NULL,
remove_individuals = NULL,
exclude_markers = NULL,
extract_markers = NULL,
legend_text_size = 5,
legend_title_size = 7,
axis_text_size = 5,
axis_title_size = 7,
title_size = 9,
subplot_label_size = 9,
path2plink = NULL,
showPlinkOutput = TRUE
)
Arguments
indir |
[character] /path/to/directory containing the basic PLINK data files name.bim, name.bed, name.fam files. |
qcdir |
[character] /path/to/directory where results will be written to.
If |
name |
[character] Prefix of PLINK files, i.e. name.bed, name.bim, name.fam. |
do.check_snp_missingness |
[logical] If TRUE, run
|
lmissTh |
[double] Threshold for acceptable variant missing rate across samples. |
do.check_hwe |
[logical] If TRUE, run |
hweTh |
[double] Significance threshold for deviation from HWE. |
do.check_maf |
[logical] If TRUE, run |
macTh |
[double] Threshold for minor allele cut cut-off, if both mafTh and macTh are specified, macTh is used (macTh = mafTh\*2\*NrSamples). |
mafTh |
[double] Threshold for minor allele frequency cut-off. |
interactive |
[logical] Should plots be shown interactively? When choosing this option, make sure you have X-forwarding/graphical interface available for interactive plotting. Alternatively, set interactive=FALSE and save the returned plot object (p_marker) via ggplot2::ggsave(p=p_marker, other_arguments) or pdf(outfile) print(p_marker) dev.off(). |
verbose |
[logical] If TRUE, progress info is printed to standard out. |
keep_individuals |
[character] Path to file with individuals to be retained in the analysis. The file has to be a space/tab-delimited text file with family IDs in the first column and within-family IDs in the second column. All samples not listed in this file will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#indiv. Default: NULL, i.e. no filtering on individuals. |
remove_individuals |
[character] Path to file with individuals to be removed from the analysis. The file has to be a space/tab-delimited text file with family IDs in the first column and within-family IDs in the second column. All samples listed in this file will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#indiv. Default: NULL, i.e. no filtering on individuals. |
exclude_markers |
[character] Path to file with makers to be removed from the analysis. The file has to be a text file with a list of variant IDs (usually one per line, but it's okay for them to just be separated by spaces). All listed variants will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#snp. Default: NULL, i.e. no filtering on markers. |
extract_markers |
[character] Path to file with makers to be included in the analysis. The file has to be a text file with a list of variant IDs (usually one per line, but it's okay for them to just be separated by spaces). All unlisted variants will be removed from the current analysis. See https://www.cog-genomics.org/plink/1.9/filter#snp. Default: NULL, i.e. no filtering on markers. |
legend_text_size |
[integer] Size for legend text. |
legend_title_size |
[integer] Size for legend title. |
axis_text_size |
[integer] Size for axis text. |
axis_title_size |
[integer] Size for axis title. |
title_size |
[integer] Size for plot title. |
subplot_label_size |
[integer] Size of the subplot labeling. |
path2plink |
[character] Absolute path to PLINK executable
(https://www.cog-genomics.org/plink/1.9/) i.e.
plink should be accessible as path2plink -h. The full name of the executable
should be specified: for windows OS, this means path/plink.exe, for unix
platforms this is path/plink. If not provided, assumed that PATH set-up works
and PLINK will be found by |
showPlinkOutput |
[logical] If TRUE, plink log and error messages are printed to standard out. |
Details
perMarkerQC wraps around the marker QC functions
check_snp_missingness
, check_hwe
and
check_maf
. For details on the parameters and outputs, check
these function documentations.
Value
Named [list] with i) fail_list, a named [list] with 1.
SNP_missingness, containing SNP IDs [vector] failing the missingness
threshold lmissTh, 2. hwe, containing SNP IDs [vector] failing the HWE exact
test threshold hweTh and 3. maf, containing SNPs Ids [vector] failing the MAF
threshold mafTh/MAC threshold macTh and ii) p_markerQC, a ggplot2-object
'containing' a sub-paneled plot with the QC-plots of
check_snp_missingness
, check_hwe
and
check_maf
, which can be shown by print(p_markerQC).
List entries contain NULL if that specific check was not chosen.
Examples
indir <- system.file("extdata", package="plinkQC")
qcdir <- tempdir()
name <- "data"
path2plink <- '/path/to/plink'
# the following code is not run on package build, as the path2plink on the
# user system is not known.
# All quality control checks
## Not run:
# run on all markers and individuals
fail_markers <- perMarkerQC(indir=indir, qcdir=qcdir, name=name,
interactive=FALSE, verbose=TRUE, path2plink=path2plink)
# run on subset of individuals and markers
keep_individuals_file <- system.file("extdata", "keep_individuals",
package="plinkQC")
extract_markers_file <- system.file("extdata", "extract_markers",
package="plinkQC")
fail_markers <- perMarkerQC(qcdir=qcdir, indir=indir,
name=name, interactive=FALSE, verbose=TRUE, path2plink=path2plink,
keep_individuals=keep_individuals_file, extract_markers=extract_markers_file)
## End(Not run)