HWMissing {HardyWeinberg} | R Documentation |
Test a bi-allelic marker for Hardy-Weinberg equilibrium in the presence of missing genotype information.
Description
Function HWMissing
imputes missing genotype data with a
multinomial logit model that uses information from allele intensities
and/or neighbouring markers. Multiple imputation algorithms
implemented in the Mice package are used to obtain imputed data sets.
Inference for HWE is carried out by estimating the inbreeding
coefficient or exact p-values for each imputed data set, and
by combining all estimates
using Rubin's pooling rules.
Usage
HWMissing(X, imputecolumn = 1, m = 50, coding = c(0,1,2), verbose = FALSE, alpha = 0.05,
varest = "oneovern", statistic = "chisquare", alternative =
"two.sided", ...)
Arguments
X |
An input data frame. By default, the first column should contain the SNP with missing values. |
imputecolumn |
Indicates which column of the supplied data frame
is to be imputed (by default, the first colum, |
m |
The number of imputations (50 by default) |
coding |
Indicates how the genotype data is coded (e.g. 0 for AA, 1 for AB, and 2 for BB). |
verbose |
|
alpha |
significance level (0.05 by default) used when computing confidence intervals |
varest |
Estimator for the variance of the inbreeding
coefficient. |
statistic |
If |
alternative |
|
... |
additional options for function |
Details
The function HWMissing
tests one genetic marker (e.g. a SNP)
with missings for HWE. By default, this marker is supposed to be the
first column of dataframe X
. The other columns of X
contain covariates to be used in the imputation model. Covariates
will typically be other, correlated markers or allele intensities of
the SNP to be imputed. Covariate markers should be coded as factor
variables whereas allele intensities should be numerical
variables. By default, a polytomous regression model will be used to
impute the missings. If the covariates also contain missings, an
imputation method for each column of X
can be specified by
using the method
of mice (see example below).
If there are no covariates, missings can be imputed under the MCAR
assumption. In that case, missings are imputed by taking a random
sample from the observed data. This is what HWMissing
will do
if no covariates are supplied, X
being a single factor
variable.
Several estimators for the variance of the inbreeding coefficient
have been described in the literature. The asymptotic variance of the
inbreeding coefficient under the null hypothesis is 1/n, and is used
if varest = "oneovern"
is used. This is the recommended
option. Alternatively, the approximation described in Weir (p. 66) can be used
with varest = "bailey"
.
Value
Res |
A vector with the inbreeding coefficient, a confidence interval for the inbreeding coefficient, a p-value for a HWE test and missing data statistics. |
Xmat |
A matrix with the genotypic composition of each of the
|
Author(s)
Jan Graffelman jan.graffelman@upc.edu
References
Little, R. J. A. and Rubin, D. B. (2002) Statistical analysis with missing data. Second edition, New York, John Wiley & sons.
Graffelman, J., S\'anchez, M., Cook, S. and Moreno, V. (2013) Statistical inference for Hardy-Weinberg proportions in the presence of missing genotype information. PLoS ONE 8(12): e83316. doi:10.1371/journal.pone.0083316
Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.
See Also
Examples
data(Markers)
## Not run:
set.seed(123)
Results <- HWMissing(Markers[,1],m=50,verbose=TRUE)$Res # no covariates, imputation assuming MCAR.
set.seed(123)
Results <- HWMissing(Markers[,1:3],m=50,verbose=TRUE)$Res # impute with two allele intensities.
set.seed(123)
Results <- HWMissing(Markers[,c(1,4,5)],m=50,verbose=TRUE)$Res # impute with two covariate SNPs
## End(Not run)