create_hapmap_reference {QCGWAS}R Documentation

Create an allele-reference file from HapMap data

Description

This function creates the standard allele reference file, as used by QC_GWAS and match_alleles, from data publicly available at the website of the international HapMap project (see 'References').

Usage

create_hapmap_reference(dir = getwd(),
   download_hapmap = FALSE, download_subset,
   hapmap_files = list.files(path = dir, pattern = "freqs_chr"),
   filename = "allele_reference_HapMap",
   save_txt = TRUE, save_rdata = !save_txt,
   return_reference = FALSE)

Arguments

dir

character string; the directory of the input and output files. Note that R uses forward slash (/) where Windows uses the backslash (\).

download_hapmap

logical; if TRUE, the required allele-frequency files are downloaded from the HapMap website into dir, and then turned into a reference. If FALSE, the files specified in hapmap_files are used.

download_subset

character-string; indicates the population to download for creating the reference. Options are: ASW, CEU, CHB, CHD, GIH, JPT, LWK, MEX, MKK, TSI, YRI.

hapmap_files

character vector of the filenames of HapMap frequency-files to be included in the reference. The default option includes all files with the string "freqs_chr" in the filename. (This argument is only used when download_hapmap is FALSE.)

filename

character string; the name of the output file, without file-extension.

save_txt, save_rdata

logical; should the reference be saved as a tab-delimitated text file and/or an RData file? If saved as RData, the object name allele_ref_std is used for the reference table.

return_reference

logical; should the function return the reference as it output value?

Details

The function removes SNPs with invalid alleles and with allele frequencies that do not add up to 1. It also removes all instances of duplicate SNPids. If such entries are encountered, a warning is printed in the R console and the entries are saved in a .txt file in the output directory.

Like the QC_GWAS, create_hapmap_reference codes the X chromosome as 23, Y as 24, XY (not available on HapMap website) as 25 and M as 26.

Both the .RData export and the function return store the alleles as factors rather than character strings.

Value

If return_reference is TRUE, the function returns the generated reference table. If FALSE, it returns an invisible NULL.

References

The required data is available at the Website of the International HapMap project, under bulk data downloads > bulk data > frequencies

http://hapmap.ncbi.nlm.nih.gov

The HapMap files downloaded by this function are subject to the HapMap terms and policies. See: http://hapmap.ncbi.nlm.nih.gov/datareleasepolicy.html

See Also

match_alleles

Examples

  # This command will download the CEU HapMap dataset and use
  # it to generate an allele-reference. Create a folder
  # "new_hapmap" to store the data and make sure there is
  # sufficient disk space and a reasonably fast internet
  # connection.

  ## Not run: 
    new_hapmap <- create_hapmap_reference(dir = "C:/new_hapmap",
                                download_hapmap = TRUE, download_subset = "CEU",
                                filename = "new_hapmap", save_txt = TRUE,
                                return_reference = TRUE)
  
## End(Not run)

[Package QCGWAS version 1.0-9 Index]