summ_distance {pdqr}R Documentation

Summarize pair of distributions with distance

Description

This function computes distance between two distributions represented by pdqr-functions. Here "distance" is used in a broad sense: a single non-negative number representing how much two distributions differ from one another. Bigger values indicate bigger difference. Zero value means that input distributions are equivalent based on the method used (except method "avgdist" which is almost always returns positive value). The notion of "distance" is useful for doing statistical inference about similarity of two groups of numbers.

Usage

summ_distance(f, g, method = "KS")

Arguments

f

A pdqr-function of any type and class.

g

A pdqr-function of any type and class.

method

Method for computing distance. Should be one of "KS", "totvar", "compare", "wass", "cramer", "align", "avgdist", "entropy".

Details

Methods can be separated into three categories: probability based, metric based, and entropy based.

Probability based methods return a number between 0 and 1 which is computed in the way that mostly based on probability:

Metric based methods compute "how far" two distributions are apart on the real line:

Entropy based methods compute output based on entropy characteristics:

Value

A single non-negative number representing distance between pair of distributions. For methods "KS", "totvar", and "compare" it is not bigger than 1. For method "avgdist" it is almost always bigger than 0.

See Also

summ_separation() for computation of optimal threshold separating pair of distributions.

Other summary functions: summ_center(), summ_classmetric(), summ_entropy(), summ_hdr(), summ_interval(), summ_moment(), summ_order(), summ_prob_true(), summ_pval(), summ_quantile(), summ_roc(), summ_separation(), summ_spread()

Examples

d_unif <- as_d(dunif, max = 2)
d_norm <- as_d(dnorm, mean = 1)

vapply(
  c(
    "KS", "totvar", "compare",
    "wass", "cramer", "align", "avgdist",
    "entropy"
  ),
  function(meth) {
    summ_distance(d_unif, d_norm, method = meth)
  },
  numeric(1)
)

# "Supremum" quality of "KS" distance
d_dis <- new_d(2, "discrete")
## Distance is 1, which is a limit of |F - G| at points which tend to 2 from
## left
summ_distance(d_dis, d_unif, method = "KS")

[Package pdqr version 0.3.1 Index]