filter_GWAS {QCGWAS} | R Documentation |
Automated filtering and reformatting of GWAS results files
Description
This function was created as a convenient way to automate the
removal of low-quality and non-autosomal SNPs. It
includes the same formatting options as QC_GWAS
.
Usage
filter_GWAS(ini_file,
GWAS_files, output_names,
gzip_output = TRUE,
dir_GWAS = getwd(), dir_output = dir_GWAS,
FRQ_HQ = NULL, HWE_HQ = NULL,
cal_HQ = NULL, imp_HQ = NULL,
FRQ_NA = TRUE, HWE_NA = TRUE,
cal_NA = TRUE, imp_NA = TRUE,
ignore_impstatus = FALSE,
remove_X = FALSE, remove_Y = FALSE,
remove_XY = FALSE, remove_M = FALSE,
header_translations,
check_impstatus = FALSE,
imputed_T = c("1", "TRUE", "yes", "YES", "y", "Y"),
imputed_F = c("0", "FALSE", "no", "NO", "n", "N"),
imputed_NA = NULL,
column_separators = c("\t", " ", "", ",", ";"),
header = TRUE, nrows = -1, nrows_test = 1000,
comment.char = "", na.strings = c("NA", "."),
out_header = "original", out_quote = FALSE,
out_sep = "\t", out_eol = "\n", out_na = "NA",
out_dec = ".", out_qmethod = "escape",
out_rownames = FALSE, out_colnames = TRUE, ...)
Arguments
ini_file |
(the filename of) a table listing the files to be processed and the filters to be applied. See 'Details'. |
GWAS_files |
character vector: when no |
output_names |
character vector: the filenames for the
output files. The default option is to use the input
filenames. Note that, unlike with other |
gzip_output |
logical; should the output files be compressed? |
dir_GWAS , dir_output |
character-strings specifying the directory address of the folders for the input files and the output, respectively. Note that R uses forward slash (/) where Windows uses backslash (\). |
FRQ_HQ , HWE_HQ , cal_HQ , imp_HQ |
Numeric vectors. When no |
FRQ_NA , HWE_NA , cal_NA , imp_NA |
Logical vectors. When no |
ignore_impstatus |
Logical vector. When no |
remove_X , remove_Y , remove_XY , remove_M |
logical; respectively whether X-chromosome, Y-chromosome,
pseudo-autosomal and mitochondrial SNPs are removed. Note:
these arguments accept only a single |
header_translations |
translation table for column names.
See |
check_impstatus |
logical; should
|
imputed_T , imputed_F , imputed_NA |
arguments passed to
|
column_separators |
character string or vector; specifies
the values used as column delimitator in the GWAS file(s). The
argument is passed to |
nrows_test |
integer; the number of rows used for
"trial-loading". Before loading the entire dataset, the
function |
header , nrows , comment.char , na.strings , ... |
arguments passed to |
out_header |
Translation table for the column names of
the output file. This argument is the opposite of
|
out_quote , out_sep , out_eol , out_na , out_dec , out_qmethod , out_rownames , out_colnames |
arguments passed to
|
Details
The easiest way to use filter_GWAS
is by passing an ini
file to the ini_file
argument.
The ini file can be generated by running QC_series
with the save_filtersettings
argument set to TRUE
.
The output will include a file 'Check_filtersettings.txt',
describing the (high-quality) filter settings used for each
file (taking into account whether there was enough data, i.e.
whether the use_threshold
was met, to apply the filters).
The ini_file
argument accepts both a table
or the name of a file in dir_GWAS
or the
current R working directory.
If no ini_file
is specified, the function will use the
GWAS_files
, x_HQ, x_NA and ignore_impstatus
arguments to construct such a table.
GWAS_files
can either be a character vector or a single
value. If a single string, all filenames containing the string
will be processed. The other arguments can also be a vector or
a single value; if the latter, they will be recycled to create
a vector of the correct length.
If neither ini_file
nor GWAS_files
are specified,
the function will look for a file
Check_filtersettings.txt
in dir_GWAS
and the current R working directory.
Note that ini_file
overrules the other filter settings,
i.e. one cannot adjust ini_file
through the other
arguments.
Value
An invisible logical vector, indicating which files were successfully filtered.
Note
R is not the optimal platform for filtering GWAS files. This function was added at the request of a user, but an UNIX script is likely to be faster.