dcovustat {steadyICA}R Documentation

Calculate distance covariance via U-statistics

Description

Calculates the square of the U-statistic formulation of distance covariance. This is faster than the function 'dcov' in the R package 'energy' and requires less memory. Note that negative values are possible in this version.

Usage

dcovustat(x,y,alpha=1)

Arguments

x

A vector or matrix.

y

A vector or matrix with the same number of observations as x, though the number of columns of x and y may differ

alpha

A scaling parameter in the interval (0,2] used for calculating distances.

Value

Returns the distance covariance U-statistic.

Note

The value returned by dcovustat is equal to the square of the value returned by energy::dcov in the limit.

In dcovustat, a vector of length n is stored; in energy::dcov, an n x n matrix is stored. Thus, dcovustat requires far less memory and works for very large datasets.

Even though dcovustat converges to the square of the distance covariance of the random variables x and y, it can be negative.

Author(s)

David Matteson

References

Matteson, D. S. & Tsay, R. Independent component analysis via U-Statistics. <http://www.stat.cornell.edu/~matteson/#ICA>

Szekely, G., Rizzo, M. & Bakirov, N. Measuring and testing dependence by correlation of distances. (2007) The Annals of Statistics, 35, 2769-2794.

See Also

multidcov, energy::dcov

Examples

x = rnorm(5000)
y = rbinom(5000,1,0.5)
y = y - 1*(y==0)
z = y*exp(-x) #some non-linear dependence

dcovustat(x[1:1000],y[1:1000]) #close to zero

a = Sys.time()
dcovustat(x[1:1000],z[1:1000]) #greater than zero
a = Sys.time() - a

#measures of linear dependence close to zero:
cov(x,z)
cor(rank(x),rank(z))


## Not run: 
#dcovustat differs from energy::dcov but are equal in the limit
library(energy)
b = Sys.time()
(dcov(x[1:1000],z[1:1000]))^2
b = Sys.time() - b
as.double(b)/as.double(a) #dcovustat is much faster

## energy::dcov and dcovustat become approximately equal as n increases:
c = Sys.time()
dcovustat(x,z)
c = difftime(Sys.time(), c, sec)
d = Sys.time()
(dcov(x,z)^2)
d = difftime(Sys.time(), d, sec)
as.double(d)/as.double(c) 

## End(Not run)

[Package steadyICA version 1.0 Index]