load_GWAS {QCGWAS} | R Documentation |
Easy loading of GWAS results files
Description
load_GWAS
is wrapper-function of read.table
that makes loading large GWAS results files less of a hassle. It
automatically unpacks .zip and .gz files and uses
load_test
to determine which column separator the file
uses.
Usage
load_GWAS(filename, dir = getwd(),
column_separators = c("\t", " ", "", ",", ";"),
test_nrows = 1000,
header = TRUE, nrows = -1,
comment.char = "", na.strings = c("NA", "."),
stringsAsFactors = FALSE, ...)
load_test(filename, dir = getwd(),
column_separators = c("\t", " ", "", ",", ";"),
test_nrows = 1000, ...)
Arguments
filename |
character string; the complete filename of the
file to be loaded. Note that compressed files (.gz or .zip
files) can only be unpacked if the filename of the archive
contains the extension of the archived file. For example, if
the archived file is named |
dir |
character string; the directory containing the file. Note that R uses forward slash (/) where Windows uses backslash (\). |
column_separators |
character string or vector of
the column-separators to be tried by |
test_nrows |
integer; the number of lines that
|
header , nrows , comment.char , na.strings , stringsAsFactors , ... |
Arguments passed to |
Details
load_test
determines the correct column separator
simply by trying them individually until it finds one that
works (that is: one that results in a dataset with an equal
number of cells in every row AND at least five or more
columns). If none work, it reports the error-message generated
by the last column separator tried.
The column separators are tried in the order specified by the
column_separators
argument.
By default, load_test
only checks the first 1000 lines
(adjustable by the test_nrows
argument); if the problem
lies further down in the dataset, it will not catch it. In such
a case, load_GWAS
and QC_GWAS
will crash
when attempting to load the dataset.
A common problem is employing white-space (""
) as
column separator for a file that uses empty fields to indicate
missing values. The separators surrounding an empty field are
adjacent, so R parses them as a single column separator. In
this particular example, specifying a single space
(" "
) or tab ("\t"
) as column separator solves
the problem (this is why the default setting of
column_separators
puts these values before white-space).
Value
load_GWAS
returns the table imported from the specified file.
load_test
returns a list with 4 components:
success |
logical; whether |
error |
character string; if unable to load the file, this returns the error-message of the last column separator to be tried. |
file_type |
character string; the last three characters
of |
sep |
the first column-separator that succeeded in loading a dataset with five or more columns. |
Note
load_GWAS
uses the same default loading-settings as
QC_GWAS
. load_test
, on the other hand, has no
default values for header
, comment.char
,
na.strings
and stringsAsFactors
, and uses the
read.table
defaults instead.
Examples
## As the function requires a GWAS file to work,
## the following code should be adjusted before execution.
## Because this is a demonstration, the nrows argument is used
## to read only the first 100 rows.
## Not run:
data_GWAS <-
load_GWAS("GWA_results1.txt.zip",
dir = "C:/GWAS_results",
nrows = 100)
## End(Not run)