QC_histogram {QCGWAS} | R Documentation |
Histogram(s) of expected and observed data distribution
Description
QC_histogram
creates two histograms: one showing the
observed data distribution of a numeric variable, and one
showing the expected distribution.
It includes the option to filter the data with the
high-quality filter. histogram_series
generates a
series of such histograms for multiple filter settings.
Usage
QC_histogram(dataset, data_col = 1,
save_name = "dataset", save_dir = getwd(),
export_outliers = FALSE,
filter_FRQ = NULL, filter_cal = NULL,
filter_HWE = NULL, filter_imp = NULL,
filter_NA = TRUE,
filter_NA_FRQ = filter_NA, filter_NA_cal = filter_NA,
filter_NA_HWE = filter_NA, filter_NA_imp = filter_NA,
breaks = "Sturges",
graph_name = colnames(dataset)[data_col],
header_translations, check_impstatus = FALSE,
ignore_impstatus = FALSE,
T_strings = c("1", "TRUE", "yes", "YES", "y", "Y"),
F_strings = c("0", "FALSE", "no", "NO", "n", "N"),
NA_strings = c(NA, "NA", ".", "-"), ...)
histogram_series(dataset, data_col = 1,
save_name = paste0("dataset_F", 1:nrow(plot_table)),
save_dir = getwd(), export_outliers = FALSE,
filter_FRQ = NULL, filter_cal = NULL,
filter_HWE = NULL, filter_imp = NULL,
filter_NA = TRUE,
filter_NA_FRQ = filter_NA, filter_NA_cal = filter_NA,
filter_NA_HWE = filter_NA, filter_NA_imp = filter_NA,
breaks = "Sturges",
header_translations, ignore_impstatus = FALSE,
check_impstatus = FALSE,
T_strings = c("1", "TRUE", "yes", "YES", "y", "Y"),
F_strings = c("0", "FALSE", "no", "NO", "n", "N"),
NA_strings = c(NA, "NA", ".", "-"),
...)
Arguments
dataset |
vector or table containing the variable of interest. |
data_col |
name or number of the column of |
save_name |
for |
save_dir |
character string; the directory where the output files are saved. Note that R uses forward slash (/) where Windows uses the backslash (\). |
export_outliers |
logical or numeric value; should outlying entries (which are excluded from the plot) be exported to an output file? If numeric, the number specifies the max. number of entries that is exported. |
filter_FRQ , filter_cal , filter_HWE , filter_imp |
Filter threshold-values for allele-frequency, callrate,
HWE p-value and imputation quality, respectively. Passed to
|
filter_NA |
logical; if |
filter_NA_FRQ , filter_NA_cal , filter_NA_HWE , filter_NA_imp |
logical; variable-specific settings for |
breaks |
argument passed to |
graph_name |
character string; used in the title of the plot. |
header_translations |
translation table for column names.
See |
check_impstatus |
logical; should
|
ignore_impstatus |
logical; if |
T_strings , F_strings , NA_strings |
arguments passed to
|
... |
in |
Details
histogram_series
accepts multiple filter-values, and
passes these one by one to QC_histogram
to generate a
series of histograms. For example, specifying:
filter_FRQ = c(0.05, 0.10), filter_cal = c(0.90, 0.95)
will generate two histograms. The first excludes SNPs with
allele frequency < 0.05 or callrate < 0.90; the second allele
frequency < 0.10 or callrate < 0.95. The same principle
applies to the NA_filter
settings. If the vectors
submitted to the filter arguments are of unequal length, the
shorter vector will be recycled until it equals the length of
the longer (if possible). To filter missing values only, set
the filter to NA
and the corresponding NA-filter
argument to TRUE
. Setting the filter argument to
NULL
will disable the filter entirely, regardless of
the NA filter setting.
Value
Both functions return an invisible value NULL
.
See Also
For creating QQ plots: QQ_plot
.
Examples
## Not run:
data("gwa_sample")
QC_histogram(dataset = gwa_sample, data_col = "EFFECT",
save_name = "sample_histogram",
filter_FRQ = 0.01, filter_cal = 0.95,
filter_NA = FALSE,
graph_name = "Effect size histogram")
histogram_series(dataset = gwa_sample, data_col = "EFFECT",
save_name = "sample_histogram",
filter_FRQ = c(NA, 0.01, 0.01),
filter_cal = c(NA, 0.95, 0.95),
filter_NA = c(FALSE, FALSE, TRUE))
## End(Not run)