create_hapmap_reference {QCGWAS} | R Documentation |
Create an allele-reference file from HapMap data
Description
This function creates the standard allele reference file, as
used by QC_GWAS
and match_alleles
,
from data publicly available at the website of the
international HapMap project (see 'References').
Usage
create_hapmap_reference(dir = getwd(),
download_hapmap = FALSE, download_subset,
hapmap_files = list.files(path = dir, pattern = "freqs_chr"),
filename = "allele_reference_HapMap",
save_txt = TRUE, save_rdata = !save_txt,
return_reference = FALSE)
Arguments
dir |
character string; the directory of the input and output files. Note that R uses forward slash (/) where Windows uses the backslash (\). |
download_hapmap |
logical; if |
download_subset |
character-string; indicates the population to download for creating the reference. Options are: ASW, CEU, CHB, CHD, GIH, JPT, LWK, MEX, MKK, TSI, YRI. |
hapmap_files |
character vector of the filenames of
HapMap frequency-files to be included in the reference. The
default option includes all files with the string
"freqs_chr" in the filename. (This argument is only
used when |
filename |
character string; the name of the output file, without file-extension. |
save_txt , save_rdata |
logical; should the reference be
saved as a tab-delimitated text file and/or an RData file?
If saved as RData, the object name |
return_reference |
logical; should the function return the reference as it output value? |
Details
The function removes SNPs with invalid alleles and with allele
frequencies that do not add up to 1
. It also removes
all instances of duplicate SNPids. If such entries are
encountered, a warning is printed in the R console and the
entries are saved in a .txt file in the output directory.
Like the QC_GWAS
, create_hapmap_reference
codes
the X chromosome as 23
, Y as 24
, XY (not
available on HapMap website) as 25
and M as 26
.
Both the .RData export and the function return store the alleles as factors rather than character strings.
Value
If return_reference
is TRUE
, the function
returns the generated reference table. If FALSE
, it
returns an invisible NULL
.
References
The required data is available at the Website of the International HapMap project, under bulk data downloads > bulk data > frequencies
http://hapmap.ncbi.nlm.nih.gov
The HapMap files downloaded by this function are subject to the HapMap terms and policies. See: http://hapmap.ncbi.nlm.nih.gov/datareleasepolicy.html
See Also
Examples
# This command will download the CEU HapMap dataset and use
# it to generate an allele-reference. Create a folder
# "new_hapmap" to store the data and make sure there is
# sufficient disk space and a reasonably fast internet
# connection.
## Not run:
new_hapmap <- create_hapmap_reference(dir = "C:/new_hapmap",
download_hapmap = TRUE, download_subset = "CEU",
filename = "new_hapmap", save_txt = TRUE,
return_reference = TRUE)
## End(Not run)