wtest {wtest} | R Documentation |
W-test
Description
This function performs the W-test
to calculate main effect or pairwise interactions in case-control studies
for categorical data sets. The test measures target variables' distributional difference between cases and controls via a combined
log of odds ratio. It follows a Chi-squared probability distribution with data-adaptive degrees of freedom. For pairwise interaction
calculation, the user has 3 options: (1) calculate a single pair's W-value, (2) calculate pairwise interaction for a list of variables,
which p-values are smaller than a threshold (input.pval
); (3) calculate the pairwise interaction exhaustively for all variables.
For both main and interaction calculation, the output can be filtered by p-values, such that only sets with smaller p-value
than a threshold (output.pval
) will be returned. An extension of the W-test for rare variant analysis is available in zfa
package.
Usage
wtest(data, y, w.order = c(1, 2), hf1 = "default.hf1",
hf2 = "default.hf2", which.marker = NULL, output.pval = NULL,
sort = TRUE, input.pval = 0.1, input.poolsize = 150)
Arguments
data |
a data frame or matrix containing genotypes in the columns. Genotypes should be coded as (0, 1, 2) or (0, 1). |
y |
a numeric vector of 0 or 1. |
w.order |
an integer value of 0 or 1. |
hf1 |
h and f values to calculate main effect, organized as a matrix, with columns (k, h, f), k = 2 to 3. Needed when |
hf2 |
h and f values to calculate interaction associations, organized as a matrix, with columns (k, h, f), k = 2 to 9. Needed when |
which.marker |
a numeric vector, when |
output.pval |
a p-value threshold for filtering the output. If NULL, all the results will be listed; otherwise, the function will only output the results with p-values smaller than the |
sort |
a logical value indicating whether or not to sort the output by p-values in ascending order. Default = TRUE. |
input.pval |
a p-value threshold to select markers for pairwise calculation, used only when |
input.poolsize |
an integer, with value less than the number of input variables. It is an optional filter to control the maximum number of variables to include in pairwise calculation, used only when |
Details
W-test is a model-free statistical test to measure main effect or pairwise interactions in case-control studies with categorical variables. Theoretically, the test statistic follows a Chi-squared distribution with f degrees of freedom. The data-adaptive degree of freedom f, and a scalar h in the test statistics allow the W-test to correct for distributional bias due to sparse data and small sample size. Let k be the number of columns of the 2 by k contingency table formed by a single variable or a variable pair. When the sample size is large and there is no population stratification, the h and f will approximate well to the theoretical value h = (k-1)/k, and f = k-1. When sample size is small and there is population stratification, the h and f will vary to correct for distributional bias caused by the data structure.
When w.order
=2, the wtest()
will automatically calculate the main effect first and then do a pre-filter before calculating interactions.
This filtering is to avoid overloading the memory before having a better understanding of the data. User can specify a smaller input.pval such as 0.05 or 0.001
for less output, or input.pval
=1 or NULL for exhaustive pairwise calculation. Another optional filter is input.poolsize
. It will take the top input.poolsize
number of variables to calculated pairwise effect exhaustively, selected by smallest p-value; when used together with input.pval
, the smaller set will be passed to pairwise calculation.
Value
An object "wtest"
containing:
order |
the "w.order" specified. |
results |
When |
hf1 |
The h and f values used in main effect calculation. |
hf2 |
The h and f values used in pairwise interaction calculation. |
Author(s)
Rui Sun, Maggie Haitian Wang
References
Maggie Haitian Wang, Rui Sun, Junfeng Guo, Haoyi Weng, Jack Lee, Inchi Hu, Pak Sham and Benny C.Y. Zee (2016). A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Research. doi:10.1093/nar/gkw347.
Maggie Haitian Wang, Haoyi Weng, Rui Sun, Jack Lee, William K.K. Wu, Ka Chun Chong, Benny C.Y. Zee. (2017). A Zoom-Focus algorithm (ZFA) to locate the optimal testing region for rare variant association tests. Bioinformatics, 33(15), 2330-2336.
See Also
Examples
data(diabetes.geno)
data(phenotype1)
## Step 1. HF Calculation
# Please note that parameter B is recommended to be greater than 400.
hf1 <- hf(data = diabetes.geno, w.order = 1, B = 100)
hf2 <- hf(data = diabetes.geno, w.order = 2, B = 50)
## Step 2. W-test Calculation
w1 <- wtest(diabetes.geno, phenotype1, w.order = 1, hf1 = hf1)
w2 <- wtest(diabetes.geno, phenotype1, w.order = 2, input.pval = 0.3,
input.poolsize = 50, output.pval = 0.01, hf1 = hf1, hf2 = hf2)
w.pair <- wtest(diabetes.geno, phenotype1, w.order = 2, which.marker = c(10,13), hf2 = hf2)