Gstat {entropy}R Documentation

G Statistic and Chi-Squared Statistic

Description

Gstat computes the G statistic.

chi2stat computes the Pearson chi-squared statistic.

Gstatindep computes the G statistic between the empirical observed joint distribution and the product distribution obtained from its marginals.

chi2statindep computes the Pearson chi-squared statistic of independence.

Usage

Gstat(y, freqs, unit=c("log", "log2", "log10"))
chi2stat(y, freqs, unit=c("log", "log2", "log10"))
Gstatindep(y2d, unit=c("log", "log2", "log10"))
chi2statindep(y2d, unit=c("log", "log2", "log10"))

Arguments

y

observed vector of counts.

freqs

vector of expected frequencies (probability mass function). Alternatively, counts may be provided.

y2d

matrix of counts.

unit

the unit in which entropy is measured. The default is "nats" (natural units). For computing entropy in "bits" set unit="log2".

Details

The observed counts in y and y2d are used to determine the total sample size.

The G statistic equals two times the sample size times the KL divergence between empirical observed frequencies and expected frequencies.

The Pearson chi-squared statistic equals sample size times chi-squared divergence between empirical observed frequencies and expected frequencies. It is a quadratic approximation of the G statistic.

The G statistic between the empirical observed joint distribution and the product distribution obtained from its marginals is equal to two times the sample size times mutual information.

The Pearson chi-squared statistic of independence equals the Pearson chi-squared statistic between the empirical observed joint distribution and the product distribution obtained from its marginals. It is a quadratic approximation of the corresponding G statistic.

The G statistic and the Pearson chi-squared statistic are asymptotically chi-squared distributed which allows to compute corresponding p-values.

Value

A list containing the test statistic stat, the degree of freedom df used to calculate the p-value pval.

Author(s)

Korbinian Strimmer (https://strimmerlab.github.io).

See Also

KL.plugin, chi2.plugin, mi.plugin, chi2indep.plugin.

Examples

# load entropy library 
library("entropy")

## one discrete random variable

# observed counts in each class
y = c(4, 2, 3, 1, 6, 4)
n = sum(y) # 20

# expected frequencies and counts
freqs.expected = c(0.10, 0.15, 0.35, 0.05, 0.20, 0.15)
y.expected = n*freqs.expected


# G statistic (with p-value) 
Gstat(y, freqs.expected) # from expected frequencies
Gstat(y, y.expected) # alternatively from expected counts

# G statistic computed from empirical KL divergence
2*n*KL.empirical(y, y.expected)


## Pearson chi-squared statistic (with p-value) 
# this can be viewed an approximation of the G statistic
chi2stat(y, freqs.expected) # from expected frequencies
chi2stat(y, y.expected) # alternatively from expected counts

# computed from empirical chi-squared divergence
n*chi2.empirical(y, y.expected)

# compare with built-in function
chisq.test(y, p = freqs.expected) 


## joint distribution of two discrete random variables

# contingency table with counts
y.mat = matrix(c(4, 5, 1, 2, 4, 4), ncol = 2)  # 3x2 example matrix of counts
n.mat = sum(y.mat) # 20


# G statistic between empirical observed joint distribution and product distribution
Gstatindep( y.mat )

# computed from empirical mutual information
2*n.mat*mi.empirical(y.mat)


# Pearson chi-squared statistic of independence
chi2statindep( y.mat )

# computed from empirical chi-square divergence
n.mat*chi2indep.empirical(y.mat)

# compare with built-in function
chisq.test(y.mat) 


[Package entropy version 1.3.1 Index]