R: Estimate correlation structure of beta-hat vector for...

estimate_corln {CPBayes}

R Documentation

Estimate correlation structure of beta-hat vector for multiple overlapping case-control studies using sample-overlap matrices.

Description

It computes an approximate correlation matrix of the estimated beta (log odds ratio) vector for multiple overlapping case-control studies using the sample-overlap matrices which describe the number of cases or controls shared between studies/traits, and the number of subjects who are case for one study/trait but control for another study/trait. For a cohort study, the phenotypic correlation matrix should be a reasonable substitute of this correlation matrix. These approximations are accurate when none of the diseases/traits is associated with the environmental covariates and genetic variant.

Usage

estimate_corln(n11, n00, n10)

Arguments

`n11`	An integer square matrix (number of rows must be the same as the number of studies/traits) providing the number of cases shared between all possible pairs of studies/traits. So (k,l)-th element of n11 is the number of subjects who are case for both k-th and l-th study/trait. Note that the diagonal elements of n11 are the number of cases in the studies/traits. If no case is shared between studies/traits, the off-diagonal elements of n11 will be zero. No default is specified.
`n00`	An integer square matrix (number of rows must be the same as the number of studies/traits) providing the number of controls shared between all possible pairs of studies/traits. So (k,l)-th element of n00 is the number subjects who are control for both k-th and l-th study/trait. Note that the diagonal elements of n00 are the number of controls in the studies/traits. If no control is shared between studies/traits, the off-diagonal elements will be zero. No default is specified.
`n10`	An integer square matrix (number of rows must be the same as the number of studies/traits) providing the number of subjects who are case for one study/trait and control for another study/trait. Clearly, the diagonal elements will be zero. An off diagonal element, e.g., (k,l)-th element of n10 is the number of subjects who are case for k-th study/trait and control for l-th study/trait. If there is no such overlap, all the elements of n10 will be zero. No default is specified.

Details

***Important note on the estimation of correlation structure of correlated beta-hat vector:*** In general, environmental covariates are expected to be present in a study and associated with the phenotypes of interest. Also, a small proportion of genome-wide genetic variants are expected to be associated. Hence the above approximation of the correlation matrix may not be accurate. So in general, we recommend an alternative strategy to estimate the correlation matrix using the genome-wide summary statistics data across traits as follows. First, extract all the SNPs for each of which the trait-specific univariate association p-value across all the traits are > 0.1. The trait-specific univariate association p-values are obtained using the beta-hat and standard error for each trait. Each of the SNPs selected in this way is either weakly or not associated with any of the phenotypes (null SNP). Next, select a set of independent null SNPs from the initial set of null SNPs by using a threshold of r^2 < 0.01 (r: the correlation between the genotypes at a pair of SNPs). In the absence of in-sample linkage disequilibrium (LD) information, one can use the reference panel LD information for this screening. Finally, compute the correlation matrix of the effect estimates (beta-hat vector) as the sample correlation matrix of the beta-hat vector across all the selected independent null SNPs. This strategy is more general and applicable to a cohort study or multiple overlapping studies for binary or quantitative traits with arbitrary distributions. It is also useful when the beta-hat vector for multiple non-overlapping studies become correlated due to genetically related individuals across studies. Misspecification of the correlation structure can affect the results produced by CPBayes to some extent. Hence, if genome-wide summary statistics data across traits is available, we highly recommend to use this alternative strategy to estimate the correlation matrix of the beta-hat vector. See our paper for more details.

Value

This function returns an approximate correlation matrix of the beta-hat vector for multiple overlapping case-control studies. See the example below.

References

Majumdar A, Haldar T, Bhattacharya S, Witte JS (2018) An efficient Bayesian meta analysis approach for studying cross-phenotype genetic associations. PLoS Genet 14(2): e1007139.

Examples

data(SampleOverlapMatrix)
n11 <- SampleOverlapMatrix$n11
n11
n00 <- SampleOverlapMatrix$n00
n00
n10 <- SampleOverlapMatrix$n10
n10
cor <- estimate_corln(n11, n00, n10)
cor

[Package CPBayes version 1.1.0 Index]