cit.cp {cit} | R Documentation |
Causal Inference Test for a Continuous Outcome
Description
This function implements a formal statistical hypothesis test, resulting in a p-value, to quantify uncertainty in a causal inference pertaining to a measured factor, e.g. a molecular species, which potentially mediates a known causal association between a locus or other instrumental variable and a quantitative trait. If the number of permutations is greater than zero, then the results can be used with fdr.cit to generate permutation-based FDR values (q-values) that are returned with confidence intervals to quantify uncertainty in the estimate. The outcome is continuous, the potential mediator is continuous, and the instrumental variable can be continuous, discrete (such as coding a SNP 0, 1, 2), or binary and is not limited to a single variable but may be a design matrix representing multiple variables.
Usage
cit.cp( L, G, T, C=NULL, n.resampl=50, n.perm=0, perm.index=NULL, rseed=NULL )
Arguments
L |
Vector or nxp design matrix representing the instrumental variable(s). |
G |
Continuous vector representing the potential causal mediator. |
T |
Continuous vector representing the clinical trait or outcome of interest. |
C |
Vector or nxp design matrix representing adjustment covariates. |
n.resampl |
The number of instances of the test statistic for conditional independence (test 4) generated by permutation (Millstein et al. 2009) under the null hypothesis of no mediation (independent effects of L on G and T). These data are used to estimate the parameters of the null distribution. The default is set to 50, which we have found to provide reasonable precision. |
n.perm |
If n.perm is set to an integer greater than 0, then n.perm permutations for each component test will be conducted (randomly permuting the data to generate results under the null). |
perm.index |
This item is only important when the CIT is conducted multiple times, leading to a multiple testing issue, that is addressed by computing FDR using the fdr.cit() function. To accurately account for dependencies among tests when computing FDR confidence intervals, the permutations must be the same for all tests. Thus, if 100 permutations are conducted for 500 CIT test scenarios, each of these 100 permutations are applied to all 500 tests. This is achieved by passing an argument, perm.index, which is an n row by n.perm column dataframe or matrix of permutation indices, where n is the number of observations and n.perm the number of permutations. Each column of perm.index includes a random permutation of 1:n, and this same perm.index object would passed to all 500 CIT tests. fdr.cit() would then be used to compute q-values (FDR) and q-value confidence intervals for each test. |
rseed |
If n.perm > 0, and multiple tests (CITs) are being conducted, setting rseed to the same integer for all tests insures that the permutations will be the same across CITs. This is important for maintaining the observed dependencies among tests for permuted data in order to compute accurate confidence intervals for FDR estimates. |
Details
Increasing n.resampl will increase the precision of the component test 4, the conditional independence test. This may be useful if a very small p-value is observed and high precision is desired, however, it will increase run time. The omnibus p-value, p_cit, is the maximum of the component p-values, an intersection-union test, representing the probability of the data if at least one of the component null hypotheses is true. If permutations are conducted by setting n.perm to a value greater than zero, then the results are provided in matrix (dataframe) form, where each row represents an analysis using a unique permutation, except the first row (perm = 0), which has results from the observed or non-permuted analysis. These results can then be aggregated across multiple cit.cp tests and input to the function fdr.cit to generate component test FDR values (q-values) as well as omnibus q-values with confidence intervals that correspond to the p_cit omnibus p-values.
Value
A dataframe which includes the following columns:
perm |
Indicator for permutation results. Zero indicates that the data were not permuted and subsequent rows include an integer greater than zero for each permutation conducted. |
p_cit |
CIT (omnibus) p-value |
p_TassocL |
component p-value for the test of association between T and L. |
p_TassocGgvnL |
component p-value for the test of association between T and G|L. |
p_GassocLgvnT |
component p-value for the test of association between G and L|T. |
p_LindTgvnG |
component p-value for the equivalence test of L ind T|G |
Author(s)
Joshua Millstein
References
Millstein J, Chen GK, Breton CV. 2016. cit: hypothesis testing software for mediation analysis in genomic applications. Bioinformatics. btw135. PMID: 27153715. Millstein J, Zhang B, Zhu J, Schadt EE. 2009. Disentangling molecular relationships with a causal inference test. BMC Genetics, 10:23.
Examples
# Sample Size
ss = 100
# Errors
e1 = matrix(rnorm(ss),ncol=1)
e2 = matrix(rnorm(ss),ncol=1)
# Simulate genotypes, gene expression, covariates, and clinical trait matrices
L = matrix(rbinom(ss*3,2,.5),ncol=3)
G = matrix( apply(.3*L, 1, sum) + e1,ncol=1)
T = matrix(.3*G + e2,ncol=1)
C = matrix(matrix(rnorm(ss*2),ncol=1),ncol=2)
n.perm = 5
perm.index = matrix(NA, nrow=ss, ncol=n.perm )
for( j in 1:ncol(perm.index) ) perm.index[, j] = sample( 1:ss )
results = cit.cp(L, G, T)
results
results = cit.cp(L, G, T, perm.index=perm.index, n.perm=5)
results
results = cit.cp(L, G, T, C)
results
results = cit.cp(L, G, T, C, n.perm=5)
results