| create_hapmap_reference {QCGWAS} | R Documentation |
Create an allele-reference file from HapMap data
Description
This function creates the standard allele reference file, as
used by QC_GWAS and match_alleles,
from data publicly available at the website of the
international HapMap project (see 'References').
Usage
create_hapmap_reference(dir = getwd(),
download_hapmap = FALSE, download_subset,
hapmap_files = list.files(path = dir, pattern = "freqs_chr"),
filename = "allele_reference_HapMap",
save_txt = TRUE, save_rdata = !save_txt,
return_reference = FALSE)
Arguments
dir |
character string; the directory of the input and output files. Note that R uses forward slash (/) where Windows uses the backslash (\). |
download_hapmap |
logical; if |
download_subset |
character-string; indicates the population to download for creating the reference. Options are: ASW, CEU, CHB, CHD, GIH, JPT, LWK, MEX, MKK, TSI, YRI. |
hapmap_files |
character vector of the filenames of
HapMap frequency-files to be included in the reference. The
default option includes all files with the string
"freqs_chr" in the filename. (This argument is only
used when |
filename |
character string; the name of the output file, without file-extension. |
save_txt, save_rdata |
logical; should the reference be
saved as a tab-delimitated text file and/or an RData file?
If saved as RData, the object name |
return_reference |
logical; should the function return the reference as it output value? |
Details
The function removes SNPs with invalid alleles and with allele
frequencies that do not add up to 1. It also removes
all instances of duplicate SNPids. If such entries are
encountered, a warning is printed in the R console and the
entries are saved in a .txt file in the output directory.
Like the QC_GWAS, create_hapmap_reference codes
the X chromosome as 23, Y as 24, XY (not
available on HapMap website) as 25 and M as 26.
Both the .RData export and the function return store the alleles as factors rather than character strings.
Value
If return_reference is TRUE, the function
returns the generated reference table. If FALSE, it
returns an invisible NULL.
References
The required data is available at the Website of the International HapMap project, under bulk data downloads > bulk data > frequencies
http://hapmap.ncbi.nlm.nih.gov
The HapMap files downloaded by this function are subject to the HapMap terms and policies. See: http://hapmap.ncbi.nlm.nih.gov/datareleasepolicy.html
See Also
Examples
# This command will download the CEU HapMap dataset and use
# it to generate an allele-reference. Create a folder
# "new_hapmap" to store the data and make sure there is
# sufficient disk space and a reasonably fast internet
# connection.
## Not run:
new_hapmap <- create_hapmap_reference(dir = "C:/new_hapmap",
download_hapmap = TRUE, download_subset = "CEU",
filename = "new_hapmap", save_txt = TRUE,
return_reference = TRUE)
## End(Not run)