R: 1st & 2nd Moments of the Pitman-Yor / Dirichlet Processes

G_moments {IMIFA}

R Documentation

1st & 2nd Moments of the Pitman-Yor / Dirichlet Processes

Description

Calculate the a priori expected number of clusters (G_expected) or the variance of the number of clusters (G_variance) under a PYP or DP prior for a sample of size N at given values of the concentration parameter alpha and optionally also the Pitman-Yor discount parameter. Useful for soliciting sensible priors (or fixed values) for alpha or discount under the "IMFA" and "IMIFA" methods for mcmc_IMIFA. Additionally, for a given sample size N and given expected number of clusters EG, G_calibrate elicits a value for the concentration parameter alpha or the discount parameter.

Usage

G_expected(N,
           alpha,
           discount = 0,
           MPFR = TRUE)

G_variance(N,
           alpha,
           discount = 0,
           MPFR = TRUE)

G_calibrate(N,
            EG,
            alpha = NULL,
            discount = 0,
            MPFR = TRUE,
            ...)

Arguments

`N`	The sample size.
`alpha`	The concentration parameter. Must be specified (though not for `G_calibrate`) and must be strictly greater than `-discount`. The case `alpha=0` is accommodated. When `discount` is negative `alpha` must be a positive integer multiple of `abs(discount)`. See Details for behaviour for `G_calibrate`.
`discount`	The discount parameter for the Pitman-Yor process. Must be less than 1, but typically lies in the interval [0, 1). Defaults to 0 (i.e. the Dirichlet process). When `discount` is negative `alpha` must be a positive integer multiple of `abs(discount)`. See Details for behaviour for `G_calibrate`.
`MPFR`	Logical indicating whether the high-precision libraries `Rmpfr` and `gmp` are invoked, at the expense of run-time. Defaults to `TRUE` and must be `TRUE` for `G_expected` when `alpha=0` or `G_variance` when `discount` is non-zero. For `G_calibrate`, it is strongly recommended to use `MPFR=TRUE` when `discount` is non-zero and strictly necessary when `alpha=0` is supplied. See `Note`.
`EG`	The prior expected number of clusters. Must exceed `1` and be less than `N`.
`...`	Additional arguments passed to `uniroot`, e.g. `maxiter`.

Details

All arguments are vectorised. Users can also consult G_priorDensity in order to solicit sensible priors.

For G_calibrate, only one of alpha or discount can be supplied, and the function elicits a value for the opposing parameter which achieves the desired expected number of clusters EG for the given sample size N. By default, a value for alpha subject to discount=0 (i.e. the Dirichlet process) is elicited. Note that alpha may not be a positive integer multiple of discount as it should be if discount is negative. See Examples below.

Value

The expected number of clusters under the specified prior conditions (G_expected), or the variance of the number of clusters (G_variance), or the concentration parameter alpha or discount parameter achieving a particular expected number of clusters (G_calibrate).

Note

G_variance requires use of the Rmpfr and gmp libraries for non-zero discount values. G_expected requires these libraries only for the alpha=0 case. These libraries are strongly recommended (but they are not required) for G_calbirate when discount is non-zero, but they are required when alpha=0 is supplied. Despite the high precision arithmetic used, the functions can still be unstable for large N and/or extreme values of alpha and/or discount. See the argument MPFR.

Author(s)

Keefe Murphy - <keefe.murphy@mu.ie>

References

De Blasi, P., Favaro, S., Lijoi, A., Mena, R. H., Prunster, I., and Ruggiero, M. (2015) Are Gibbs-type priors the most natural generalization of the Dirichlet process?, IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2): 212-229.

Yamato, H. and Shibuya, M. (2000) Moments of some statistics of Pitman sampling formula, Bulletin of Informatics and Cybernetics, 32(1): 1-10.

Examples

# Certain examples require the use of the Rmpfr library
suppressMessages(require("Rmpfr"))

G_expected(N=50, alpha=19.23356, MPFR=FALSE)
G_variance(N=50, alpha=19.23356, MPFR=FALSE)

G_expected(N=50, alpha=c(19.23356, 12.21619, 1),
           discount=c(0, 0.25, 0.7300045), MPFR=FALSE)
G_variance(N=50, alpha=c(19.23356, 12.21619, 1),
           discount=c(0, 0.25, 0.7300045), MPFR=c(FALSE, TRUE, TRUE))

# Examine the growth rate of the DP
DP   <- sapply(c(1, 5, 10), function(i) G_expected(1:200, alpha=i, MPFR=FALSE))
matplot(DP, type="l", xlab="N", ylab="G")

# Examine the growth rate of the PYP
PY <- sapply(c(0.25, 0.5, 0.75), function(i) G_expected(1:200, alpha=1, discount=i))
matplot(PY, type="l", xlab="N", ylab="G")

# Other special cases of the PYP are also facilitated
G_expected(N=50, alpha=c(27.1401, 0), discount=c(-27.1401/100, 0.8054448))
G_variance(N=50, alpha=c(27.1401, 0), discount=c(-27.1401/100, 0.8054448))

# Elicit values for alpha under a DP prior
G_calibrate(N=50, EG=25)

# Elicit values for alpha under a PYP prior
# G_calibrate(N=50, EG=25, discount=c(-27.1401/100, 0.25, 0.7300045))
# Elicit values for discount under a PYP prior
# G_calibrate(N=50, EG=25, alpha=c(12.21619, 1, 0), maxiter=2000)

[Package IMIFA version 2.2.0 Index]