R: QQ plot(s) of expected vs. reported p-values

QQ_plot {QCGWAS}

R Documentation

QQ plot(s) of expected vs. reported p-values

Description

QQ_plot generates a simple QQ plot of the expected and reported p-value distribution. It includes the option to filter the data with the high-quality filter. QQ_series generates a series of such QQ plots for multiple filter settings.

Usage

QQ_plot(dataset, save_name = "dataset", save_dir = getwd(),
        filter_FRQ = NULL, filter_cal = NULL,
        filter_HWE = NULL, filter_imp = NULL,
        filter_NA = TRUE,
        filter_NA_FRQ = filter_NA, filter_NA_cal = filter_NA,
        filter_NA_HWE = filter_NA, filter_NA_imp = filter_NA,
        p_cutoff = 0.05, plot_QQ_bands = FALSE,
        header_translations,
        check_impstatus = FALSE, ignore_impstatus = FALSE,
        T_strings = c("1", "TRUE", "yes", "YES", "y", "Y"),
        F_strings = c("0", "FALSE", "no", "NO", "n", "N"),
        NA_strings = c(NA, "NA", ".", "-"), ...)
QQ_series(dataset, save_name = "dataset", save_dir = getwd(),
          filter_FRQ = NULL, filter_cal = NULL,
          filter_HWE = NULL, filter_imp = NULL,
          filter_NA = TRUE,
          filter_NA_FRQ = filter_NA, filter_NA_cal = filter_NA,
          filter_NA_HWE = filter_NA, filter_NA_imp = filter_NA,
          p_cutoff = 0.05, plot_QQ_bands = FALSE,
          header_translations,
          check_impstatus = FALSE, ignore_impstatus = FALSE,
          T_strings = c("1", "TRUE", "yes", "YES", "y", "Y"),
          F_strings = c("0", "FALSE", "no", "NO", "n", "N"),
          NA_strings = c(NA, "NA", ".", "-"), ...)

Arguments

`dataset`	a data frame containing the p-value column and (depending on the settings) columns for chromosome number, position, the quality parameters, sample size and imputation status.
`save_name`	for `QQ_plot`, a character string; for `QQ_series`, a vector of character strings; specifying the filename(s) of the graph, without extension.
`save_dir`	character string; the directory where the output files are saved. Note that R uses forward slash (/) where Windows uses the backslash (\).
`filter_FRQ`, `filter_cal`, `filter_HWE`, `filter_imp`	Filter threshold-values for allele-frequency, callrate, HWE p-value and imputation quality, respectively. Passed to `HQ_filter`. `QQ_plot` takes only single values, but `QQ_series` accepts vectors as well (see 'details').
`filter_NA`	logical; if `TRUE`, then missing filter variables will be excluded; if `FALSE`, they will be ignored. `QQ_plot` takes only single values, but `QQ_series` accepts vectors as well (see 'Details'). `filter_NA` is the default setting for all variables; variable-specific settings can be specified with the following arguments.
`filter_NA_FRQ`, `filter_NA_cal`, `filter_NA_HWE`, `filter_NA_imp`	logical; variable-specific settings for `filter_NA`. These arguments are passed to `HQ_filter`.
`p_cutoff`	numeric; the threshold of p-values to be shown in the QQ plot(s). Higher (less significant) p-values are excluded from the plot. The default setting is `0.05`, which excludes 95% of data-points. It's not recommended to increase the value above `0.05`, as this may dramatically increase running time and memory usage.
`plot_QQ_bands`	logical; should probability bands be added to the QQ plot?
`header_translations`	translation table for column names. See `translate_header` for more information. If the argument is left empty, `dataset` is assumed to use the standard column-names of `QC_GWAS`.
`check_impstatus`	logical; should the imputation-status column be passed to `convert_impstatus`?
`ignore_impstatus`	logical; if `FALSE`, HWE p-value and callrate filters are applied only to genotyped SNPs, and imputation quality filters only to imputed SNPs. If `TRUE`, the filters are applied to all SNPs regardless of the imputation status.
`T_strings`, `F_strings`, `NA_strings`	arguments passed to `convert_impstatus`.
`...`	arguments passed to `plot`.

Details

QQ_series accepts multiple filter-values, and passes these one by one to QQ_plot to generate a series of plots. For example, specifying:

filter_FRQ = c(0.05, 0.10), filter_cal = c(0.90, 0.95)

will generate two plots. The first excludes SNPs with allele frequency < 0.05 or callrate < 0.90; the second allele frequency < 0.10 or callrate < 0.95. The same principle applies to the NA_filter settings. If the vectors submitted to the filter arguments are of unequal length, the shorter vector will be recycled until it equals the length of the longer (if possible). To filter missing values only, set the filter to NA and the corresponding NA-filter argument to TRUE. Setting the filter argument to NULL will disable the filter entirely, regardless of the NA-filter setting.

Value

Both functions return an invisible value NULL.

Examples

  ## Not run: 
    data("gwa_sample")
  
    QQ_plot(dataset = gwa_sample,
            save_name = "sample_QQ",
            filter_FRQ = 0.01, filter_cal = 0.95,
            filter_NA = FALSE)
  
    QQ_series(dataset = gwa_sample,
              save_name = "sample_QQ",
              filter_FRQ = c(NA, 0.01, 0.01),
              filter_cal = c(NA, 0.95, 0.95),
              filter_NA = c(FALSE, FALSE, TRUE))
  
## End(Not run)

[Package QCGWAS version 1.0-9 Index]