HWMissing {HardyWeinberg}R Documentation

Test a bi-allelic marker for Hardy-Weinberg equilibrium in the presence of missing genotype information.

Description

Function HWMissing imputes missing genotype data with a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Multiple imputation algorithms implemented in the Mice package are used to obtain imputed data sets. Inference for HWE is carried out by estimating the inbreeding coefficient or exact p-values for each imputed data set, and by combining all estimates using Rubin's pooling rules.

Usage

HWMissing(X, imputecolumn = 1, m = 50, coding = c(0,1,2), verbose = FALSE, alpha = 0.05,
    varest = "oneovern", statistic = "chisquare",  alternative =
"two.sided", ...)

Arguments

X

An input data frame. By default, the first column should contain the SNP with missing values.

imputecolumn

Indicates which column of the supplied data frame is to be imputed (by default, the first colum, imputecolumn=1

m

The number of imputations (50 by default)

coding

Indicates how the genotype data is coded (e.g. 0 for AA, 1 for AB, and 2 for BB).

verbose

verbose = TRUE prints results, verbose = FALSE is silent.

alpha

significance level (0.05 by default) used when computing confidence intervals

varest

Estimator for the variance of the inbreeding coefficient. varest="oneovern" is the default and sets the variance under the null (1/n). varest="bailey" uses an approximation (see details).

statistic

If statistic = "chisquare" then inbreeding coefficients (equivalent to chisquare statistics) will be computed for each imputed data set and then combined. If statistic = "exact" then one-sided exact tests will be computed for each imputed data set and the resulting p-values will be combined.

alternative

two.sided (default) will perform a two-sided test where both an excess and a dearth of heterozygotes count as evidence against HWE. less is a one-sided test where only dearth of heterozygotes counts a evidence against HWE, greater is a one-sided test where only excess of heterozygotes counts as evidence against HWE.

...

additional options for function mice of the Mice package

Details

The function HWMissing tests one genetic marker (e.g. a SNP) with missings for HWE. By default, this marker is supposed to be the first column of dataframe X. The other columns of X contain covariates to be used in the imputation model. Covariates will typically be other, correlated markers or allele intensities of the SNP to be imputed. Covariate markers should be coded as factor variables whereas allele intensities should be numerical variables. By default, a polytomous regression model will be used to impute the missings. If the covariates also contain missings, an imputation method for each column of X can be specified by using the method of mice (see example below).

If there are no covariates, missings can be imputed under the MCAR assumption. In that case, missings are imputed by taking a random sample from the observed data. This is what HWMissing will do if no covariates are supplied, X being a single factor variable.

Several estimators for the variance of the inbreeding coefficient have been described in the literature. The asymptotic variance of the inbreeding coefficient under the null hypothesis is 1/n, and is used if varest = "oneovern" is used. This is the recommended option. Alternatively, the approximation described in Weir (p. 66) can be used with varest = "bailey".

Value

Res

A vector with the inbreeding coefficient, a confidence interval for the inbreeding coefficient, a p-value for a HWE test and missing data statistics.

Xmat

A matrix with the genotypic composition of each of the m imputed data sets.

Author(s)

Jan Graffelman jan.graffelman@upc.edu

References

Little, R. J. A. and Rubin, D. B. (2002) Statistical analysis with missing data. Second edition, New York, John Wiley & sons.

Graffelman, J., S\'anchez, M., Cook, S. and Moreno, V. (2013) Statistical inference for Hardy-Weinberg proportions in the presence of missing genotype information. PLoS ONE 8(12): e83316. doi:10.1371/journal.pone.0083316

Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.

See Also

HWChisq

Examples

data(Markers)
## Not run: 
set.seed(123)
Results <- HWMissing(Markers[,1],m=50,verbose=TRUE)$Res # no covariates, imputation assuming MCAR.
set.seed(123)
Results <- HWMissing(Markers[,1:3],m=50,verbose=TRUE)$Res # impute with two allele intensities.
set.seed(123)
Results <- HWMissing(Markers[,c(1,4,5)],m=50,verbose=TRUE)$Res # impute with two covariate SNPs

## End(Not run)

[Package HardyWeinberg version 1.7.8 Index]