R: Checking GWAS p-values

check_P {QCGWAS}

R Documentation

Checking GWAS p-values

Description

A simple test to check if the reported p-values in a GWAS results file match the other statistics. This function calculates an expected p-value (from the effect size and standard error) and then correlates it with the actual, reported p-value.

Usage

check_P(dataset, HQ_subset,
        plot_correlation = FALSE, plot_if_threshold = FALSE,
        threshold_r = 0.99,
        save_name = "dataset", save_dir = getwd(),
        header_translations,
        use_log = FALSE, dataN = nrow(dataset), ...)

Arguments

`dataset`	table with at least three columns: p-value, effect size and standard error.
`HQ_subset`	an optional logical or numeric vector indicating the rows in `dataset` that contain high quality SNPs.
`plot_correlation`	logical; should a scatterplot of the reported vs. calculated p-values be made? If `TRUE`, the plot is saved as a .png file.
`plot_if_threshold`	logical; if `TRUE`, the scatterplot is only saved when the correlation between reported and calculated p-values is lower than `threshold_r`.
`threshold_r`	numeric; the correlation threshold for the scatterplot.
`save_name`	character string; the filename, without extension, for the scatterplot.
`save_dir`	character string; the directory where the output files are saved. Note that R uses forward slash (/) where Windows uses backslash (\).
`header_translations`	translation table for column names See `translate_header` for more information. If the argument is left empty, `dataset` is assumed to use the standard column names used by `QC_GWAS`.
`use_log`, `dataN`	arguments used by `QC_GWAS`; redundant when `check_P` is used separately.
`...`	arguments passed to `plot`.

Details

check_P calculates the expected p-value by taking the chi-square (1 degree of freedom) of the effect size divided by the standard error squared.

In a typical GWAS dataset, the expected and observed p-values should correlate perfectly. If this isn't the case, the problem either lies in a misidentified column, or the wrong values were used when generating the dataset.

Value

The correlation between expected and reported p-values.

Examples

  data("gwa_sample")

  selected_SNPs <- HQ_filter(data = gwa_sample,
                             FRQ_val = 0.05,
                             cal_val = 0.95,
                             filter_NA = FALSE)
  # To calculate a correlation between predicted and actual p-values:
  check_P(gwa_sample, HQ_subset = selected_SNPs,
          plot_correlation = FALSE)
  
  # To plot the correlation:
  ## Not run: 
    check_P(gwa_sample, HQ_subset = selected_SNPs,
            plot_correlation = TRUE, plot_if_threshold = FALSE,
            save_name = "sample")
  
## End(Not run)

[Package QCGWAS version 1.0-9 Index]