rapidopgs_single {RapidoPGS} | R Documentation |
Compute PGS from GWAS summary statistics using posteriors from Wakefield's approximate Bayes Factors
Description
'rapidopgs_single
computes PGS from a from GWAS summary statistics using posteriors from Wakefield's approximate Bayes Factors
Usage
rapidopgs_single(
data,
N = NULL,
trait = c("cc", "quant"),
build = "hg19",
pi_i = 1e-04,
sd.prior = if (trait == "quant") {
0.15
} else {
0.2
},
filt_threshold = NULL,
recalc = TRUE,
reference = NULL
)
Arguments
data |
a data.table containing GWAS summary statistic dataset with all required information. |
N |
a scalar representing the sample in the study, or a string indicating the column name containing it. Required for quantitative traits only. |
trait |
a string specifying if the dataset corresponds to a case-control ("cc") or a quantitative trait ("quant") GWAS. If trait = "quant", an ALT_FREQ column is required. |
build |
a string containing the genome build of the dataset, either "hg19" (for hg19/GRCh37) or "hg38" (hg38/GRCh38). DEFAULT "hg19". |
pi_i |
a scalar representing the prior probability (DEFAULT:
|
sd.prior |
the prior specifies that BETA at causal SNPs follows a centred normal distribution with standard deviation sd.prior. Sensible and widely used DEFAULTs are 0.2 for case control traits, and 0.15 * var(trait) for quantitative (selected if trait == "quant"). |
filt_threshold |
a scalar indicating the ppi threshold (if
|
recalc |
a logical indicating if weights should be
recalculated after thresholding. Only relevant if |
reference |
a string indicating the path of the reference file SNPs should be filtered and aligned to, see Details. |
Details
This function will take a GWAS summary statistic dataset as an input,
will assign align it to a reference panel file (if provided), then it will assign
SNPs to LD blocks and compute Wakefield's ppi by LD block, then will use it
to generate PGS weights by multiplying those posteriors by effect sizes (\beta
).
Optionally, it will filter SNPs by a custom filter on ppi and then recalculate weights, to improve accuracy.
Alternatively, if filt_threshold is larger than one, RapidoPGS will select the top
filt_threshold
SNPs by absolute weights (note, not ppi but weights).
The GWAS summary statistics file to compute PGS using our method must contain the following minimum columns, with these exact column names:
- CHR
Chromosome
- BP
Base position (in GRCh37/hg19 or GRCh38/hg38). If using hg38, use build = "hg38" in parameters
- REF
Reference, or non-effect allele
- ALT
Alternative, or effect allele, the one
\beta
refers to- ALT_FREQ
Minor/ALT allele frequency in the tested population, or in a close population from a reference panel. Required for Quantitative traits only
- BETA
\beta
(or log(OR)), or effect sizes- SE
standard error of
\beta
If a reference is provided, it should have 5 columns: CHR, BP, SNPID, REF, and ALT. Also, it should be in the same build as the summary statistics. In both files, column order does not matter.
Value
a data.table containing the formatted sumstats dataset with computed PGS weights.
Author(s)
Guillermo Reales, Chris Wallace
Examples
sumstats <- data.table(SNPID=c("rs139096444","rs3843766","rs61977545", "rs544733737",
"rs2177641", "rs183491817", "rs72995775","rs78598863", "rs1411315"),
CHR=c("4","20","14","2","4","6","6","21","13"),
BP=c(1479959, 13000913, 29107209, 203573414, 57331393, 11003529, 149256398,
25630085, 79166661),
REF=c("C","C","C","T","G","C","C","G","T"),
ALT=c("A","T","T","A","A","A","T","A","C"),
BETA=c(0.012,0.0079,0.0224,0.0033,0.0153,0.058,0.0742,0.001,-0.0131),
SE=c(0.0099,0.0066,0.0203,0.0171,0.0063,0.0255,0.043,0.0188,0.0074))
PGS <- rapidopgs_single(sumstats, trait = "cc")