R: A function to compute Tukey's biweight mean vector and...

biwt.est {biwt}

R Documentation

A function to compute Tukey's biweight mean vector and covariance matrix

Description

Compute a multivariate location and scale estimate based on Tukey's biweight weight function.

Usage

biwt.est(x, r=.2, med.init=covMcd(x))

Arguments

`x`	a `2 x n` matrix or data frame (`n` is the number of measurements)
`r`	breakdown (`k/n` where `k` is the largest number of measurements that can be replaced with arbitrarily large values while keeping the estimates bounded). Default is r=.2.
`med.init`	a (robust) initial estimate of the center and shape of the data. The format is a list with components center and cov (as in the output of covMcd from the rrcov library). Default is the minimum covariance determinant (MCD) on the data.

Details

A robust measure of center and shape is computed using Tukey's biweight M-estimator. The biweight estimates are essentially weighted means and covariances where the weights are calculated based on the distance of each measurement to the data center with respect to the shape of the data. The estimates should be computed pair-by-pair because the weights should depend only on the pairwise relationship at hand and not the relationship between all the observations globally.

Value

A list with components:

`biwt.mu`	the final estimate of center
`biwt.sig`	the final estimate of shape

Note

If there is too much missing data or if the initialization is not accurate, the function will compute the MCD for a given pair of observations before computing the biweight correlation (regardless of the initial settings given in the call to the function).

Author(s)

Jo Hardin jo.hardin@pomona.edu

References

Hardin, J., Mitani, A., Hicks, L., VanKoten, B.; A Robust Measure of Correlation Between Two Genes on a Microarray, BMC Bioinformatics, 8:220; 2007.

Examples



samp.data <- t(mvrnorm(30,mu=c(0,0),Sigma=matrix(c(1,.75,.75,1),ncol=2)))

samp.bw <- biwt.est(samp.data)
samp.bw

samp.bw.var1 <- samp.bw$biwt.sig[1,1]
samp.bw.var2 <- samp.bw$biwt.sig[2,2]
samp.bw.cov <- samp.bw$biwt.sig[1,2]

samp.bw.cor <- samp.bw.cov / sqrt(samp.bw.var1 * samp.bw.var2)
samp.bw.cor

# or:

samp.bw.cor <- samp.bw$biwt.sig[1,2] / 
	sqrt(samp.bw$biwt.sig[1,1]*samp.bw$biwt.sig[2,2])
samp.bw.cor

##############
# to speed up the calculations, use the median/mad for the initialization:
##############

samp.init <- list()
	samp.init$cov <- diag(apply(samp.data,1,mad,na.rm=TRUE))
	samp.init$center <- apply(samp.data,1,median,na.rm=TRUE)

samp.init

samp.bw <- biwt.est(samp.data,med.init = samp.init)
samp.bw.cor <- samp.bw$biwt.sig[1,2] / 
	sqrt(samp.bw$biwt.sig[1,1]*samp.bw$biwt.sig[2,2])
samp.bw.cor

[Package biwt version 1.0.1 Index]