wtest.high {wtest} | R Documentation |
W-test for High Order Interaction Analysis
Description
This function performs the W-test
to calculate high-order interactions in case-control studies
for categorical data sets. The test measures target variables' distributional difference between cases and controls via a combined
log of odds ratio. It follows a Chi-squared probability distribution with data-adaptive degrees of freedom. For high-order interaction
calculation, the user has 3 options: (1) calculate W-test of a set of SNPs, (2) calculate high-order interaction for a list of variables,
which p-values are smaller than a threshold (input.pval
); (3) calculate high-order interaction exhaustively for all variables.
Output can be filtered by p-values, such that only sets with smaller p-value than a threshold (output.pval
) will be returned.
Usage
wtest.high(data, y, w.order = 3, hf1 = "default.hf1",
hf.high.order = "default.high", which.marker = NULL,
output.pval = NULL, sort = TRUE, input.pval = 0.1,
input.poolsize = 10)
Arguments
data |
a data frame or matrix containing genotypes in the columns. Genotypes should be coded as (0, 1, 2) or (0, 1). |
y |
a numeric vector of 0 or 1. |
w.order |
an integer value, indicating the order of high-way interactions. For example, |
hf1 |
h and f values to calculate main effect, organized as a matrix, with columns (k, h, f), k = 2 to 3. |
hf.high.order |
h and f values to calculate high-order interactions, organized as a matrix, with columns (k, h, f), where k is the number of genotype combinations of a set of SNPs. |
which.marker |
a numeric vector indicating the column index of a set of SNPs to calculate. Default |
output.pval |
a p-value threshold for filtering the output. If NULL, all the results will be listed; otherwise, the function will only output the results with p-values smaller than the |
sort |
a logical value indicating whether or not to sort the output by p-values in ascending order. Default = TRUE. |
input.pval |
a p-value threshold to select markers for high-order interaction calculation, used only when |
input.poolsize |
an integer, with value less than the number of input variables. It is an optional filter to control the maximum number of variables to include in high-order interaction calculation, used only when |
Details
W-test is a model-free statistical test orginally proposed to measure main effect or pairwise interactions in case-control studies with categorical variables. It can be extended to high-order interaction detection by the wtest.high() function. Theoretically, the test statistic follows a Chi-squared distribution with f degrees of freedom. The data-adaptive degree of freedom f, and a scalar h in the test statistics allow the W-test to correct for distributional bias due to sparse data and small sample size. Let k be the number of columns of the 2 by k contingency table formed by a single variable or a variable pair. When the sample size is large and there is no population stratification, the h and f will approximate well to the theoretical value h = (k-1)/k, and f = k-1. When sample size is small and there is population stratification, the h and f will vary to correct for distributional bias caused by the data structure.
When w.order
> 2, the wtest()
will automatically calculate the main effect first and then do a pre-filter before calculating interactions.
This filtering is to avoid overloading the memory before having a better understanding of the data. User can specify a smaller input.pval such as 0.05 or 0.001
for less output, or input.pval
=1 or NULL for exhaustive high-order interaction calculation. Another optional filter is input.poolsize
. It will select the top input.poolsize
number of variables, ranked by p-values, to calculate high-order interactions. When used together with input.pval
, the algorithm selects the smaller set in the high-order calculation.
Value
An object "wtest"
containing:
order |
the "w.order" specified. |
results |
When order > 2 and which.marker = NULL, the test results include: (information of a set) [SNPs name, W-value, k, p-value]; (Information of the first variable in the set) [W-value, k, p-value]; (Information of the second variable in the set) [W-value, k, p-value] ... |
hf1 |
The h and f values used in main effect calculation. |
hf2 |
The h and f values used in high-order interaction calculation. |
Author(s)
Rui Sun, Maggie Haitian Wang
References
Maggie Haitian Wang, Rui Sun, Junfeng Guo, Haoyi Weng, Jack Lee, Inchi Hu, Pak Sham and Benny C.Y. Zee (2016). A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Research. doi:10.1093/nar/gkw347.
See Also
Examples
data(diabetes.geno)
data(phenotype1)
## Step 1. HF Calculation
# Please note that parameter B is recommended to be greater than 400 for w.order = 1 or 2.
# For high order interaction analysis (w.order > 2), it is recommended to use default n.sample.
hf1 <- hf(data = diabetes.geno, w.order = 1, B = 100)
hf.high <- hf(data = diabetes.geno, w.order = 3, B = 30, n.marker = 10)
## Step 2. W-test Calculation
w1 <- wtest.high(diabetes.geno, phenotype1, w.order = 1, hf1 = hf1)
w3 <- wtest.high(diabetes.geno, phenotype1, w.order = 3, input.pval = 0.3,
input.poolsize = 50, output.pval = 0.5, hf1 = hf1, hf.high.order = hf.high)
w.set <- wtest.high(diabetes.geno, phenotype1, w.order = 3, which.marker = c(10,13,20),
hf.high.order = hf.high)