R: Likelihood with data squashing and no zero counts

negLLsquash {openEBGM}

R Documentation

Likelihood with data squashing and no zero counts

Description

negLLsquash computes the negative log-likelihood based on the conditional marginal distribution of the counts, N, given that N >= N*, where N* is the smallest count used for estimating the hyperparameters. This function is minimized to estimate the hyperparameters of the prior distribution. Use this function when zero counts are not used and data squashing is used as described by DuMouchel et al. (2001). This function is the likelihood function that should usually be chosen.

Usage

negLLsquash(theta, ni, ei, wi, N_star = 1)

Arguments

`theta`	A numeric vector of hyperparameters ordered as: `\alpha_1, \beta_1, \alpha_2, \beta_2, P`.
`ni`	A whole number vector of squashed actual counts from `squashData`.
`ei`	A numeric vector of squashed expected counts from `squashData`.
`wi`	A whole number vector of bin weights from `squashData`.
`N_star`	A scalar whole number for the minimum count size used.

Details

The conditional marginal distribution for the counts, N, given that N >= N*, is based on a mixture of two negative binomial distributions. The hyperparameters for the prior distribution (mixture of gammas) are estimated by optimizing the likelihood equation from this conditional marginal distribution. It is recommended to use N_star = 1 when practical.

The hyperparameters are:

\alpha_1, \beta_1: Parameters of the first component of the marginal distribution of the counts (also the prior distribution)
\alpha_2, \beta_2: Parameters of the second component
P: Mixture fraction

This function will not need to be called directly if using exploreHypers or autoHyper.

Value

A scalar negative log-likelihood value

Warnings

Make sure N_star matches the smallest actual count in ni before using this function. Filter ni, ei, and wi if needed.

Make sure the data were actually squashed (see squashData) before using this function.

References

DuMouchel W, Pregibon D (2001). "Empirical Bayes Screening for Multi-item Associations." In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '01, pp. 67-76. ACM, New York, NY, USA. ISBN 1-58113-391-X.

Examples

data.table::setDTthreads(2)  #only needed for CRAN checks
theta_init <- c(1, 1, 3, 3, .2)  #initial guess
data(caers)
proc <- processRaw(caers)
squashed <- squashData(proc, bin_size = 300, keep_pts = 10)
squashed <- squashData(squashed, count = 2, bin_size = 13, keep_pts = 10)
negLLsquash(theta = theta_init, ni = squashed$N, ei = squashed$E,
            wi = squashed$weight)
#For hyperparameter estimation...
stats::nlminb(start = theta_init, objective = negLLsquash, ni = squashed$N,
              ei = squashed$E, wi = squashed$weight)

[Package openEBGM version 0.9.1 Index]