harmonize_sumstats {snpsettest} | R Documentation |
Harmonizing GWAS summary to reference data
Description
Finds an intersection of variants between GWAS summary and reference data.
Usage
harmonize_sumstats(
sumstats,
x,
match_by_id = TRUE,
check_strand_flip = FALSE,
return_indice = FALSE
)
Arguments
sumstats |
A data frame with two columns: "id" and "pvalue".
If
|
x |
A |
match_by_id |
If |
check_strand_flip |
Only applies when |
return_indice |
Only applied when |
Details
Pre-processing of GWAS summary data is required because the sets of variants available in a particular GWAS might be poorly matched to the variants in reference data. SNP matching can be performed either 1) by SNP ID or 2) by chromosome code, base-pair position, and allele codes, while taking into account possible strand flips and reference allele swap. For matched entries, the SNP IDs in GWAS summary data are replaced with the ones in the reference data.
Value
A data frame with columns: "id", "chr", "pos", "A1", "A2" and
"pvalue". If return_indice = TRUE
, the data frame includes additional
columns key_
, swapped_
, and flipped_
. key_
is "chr_pos_A1_A2" in
sumstat
(the original input before harmonization). swapped_
contains a
logical vector indicating reference allele swap. flipped_
contains a
logical vector indicating strand flip.
Examples
## GWAS summary statistics
head(exGWAS)
## Load reference genotype data
bfile <- system.file("extdata", "example.bed", package = "snpsettest")
x <- read_reference_bed(path = bfile)
## Harmonize by SNP IDs
hsumstats1 <- harmonize_sumstats(exGWAS, x)
## Harmonize by genomic position and allele codes
## Reference allele swap will be taken into account
hsumstats2 <- harmonize_sumstats(exGWAS, x, match_by_id = FALSE)
## Check matching entries by flipping allele codes
## Ambiguous SNPs will be excluded from harmonization
hsumstats3 <- harmonize_sumstats(exGWAS, x, match_by_id = FALSE,
check_strand_flip = TRUE)