R: Maximum likelihood fitting of two distributions and...

fit_two_distr {epiphy}

R Documentation

Maximum likelihood fitting of two distributions and goodness-of-fit comparison.

Description

Different distributions may be used depending on the kind of provided data. By default, the Poisson and negative binomial distributions are fitted to count data, whereas the binomial and beta-binomial distributions are used with incidence data. Either Randomness assumption (Poisson or binomial distributions) or aggregation assumption (negative binomial or beta-binomial) are made, and then, a goodness-of-fit comparison of both distributions is made using a log-likelihood ratio test.

Usage

fit_two_distr(data, ...)

## Default S3 method:
fit_two_distr(data, random, aggregated, ...)

## S3 method for class 'count'
fit_two_distr(
  data,
  random = smle_pois,
  aggregated = smle_nbinom,
  n_est = c(random = 1, aggregated = 2),
  ...
)

## S3 method for class 'incidence'
fit_two_distr(
  data,
  random = smle_binom,
  aggregated = smle_betabinom,
  n_est = c(random = 1, aggregated = 2),
  ...
)

Arguments

`data`	An `intensity` object.
`...`	Additional arguments to be passed to other methods.
`random`	Distribution to describe random patterns.
`aggregated`	Distribution to describe aggregated patterns.
`n_est`	Number of estimated parameters for both distributions.

Details

Under the hood, distr_fit relies on the smle utility which is a wrapped around the optim procedure.

Note that there may appear warnings about chi-squared goodness-of-fit tests if any expected count is less than 5 (Cochran's rule of thumb).

Value

An object of class fit_two_distr, which is a list containing at least the following components:

`call`	The function `call`.
`name`	The names of both distributions.
`model`	The outputs of fitting process for both distributions.
`llr`	The result of the log-likelihood ratio test.

Other components can be present such as:

`param`	A numeric matrix of estimated parameters (that can be printed using `printCoefmat`).
`freq`	A data frame or a matrix with the observed and expected frequencies for both distributions for the different categories.
`gof`	Goodness-of-fit tests for both distributions (which are typically chi-squared goodness-of-fit tests).

References

Madden LV, Hughes G. 1995. Plant disease incidence: Distributions, heterogeneity, and temporal analysis. Annual Review of Phytopathology 33(1): 529–564. doi:10.1146/annurev.py.33.090195.002525

Examples

# Simple workflow for incidence data:
my_data <- count(arthropods)
my_data <- split(my_data, by = "t")[[3]]
my_res  <- fit_two_distr(my_data)
summary(my_res)
plot(my_res)

# Simple workflow for incidence data:
my_data <- incidence(tobacco_viruses)
my_res  <- fit_two_distr(my_data)
summary(my_res)
plot(my_res)

# Note that there are other methods to fit some common distributions.
# For example for the Poisson distribution, one can use glm:
my_arthropods <- arthropods[arthropods$t == 3, ]
my_model <- glm(my_arthropods$i ~ 1, family = poisson)
lambda <- exp(coef(my_model)[[1]]) # unique(my_model$fitted.values) works also.
lambda
# ... or the fitdistr function in MASS package:
require(MASS)
fitdistr(my_arthropods$i, "poisson")

# For the binomial distribution, glm still works:
my_model <- with(tobacco_viruses, glm(i/n ~ 1, family = binomial, weights = n))
prob <- logit(coef(my_model)[[1]], rev = TRUE)
prob
# ... but the binomial distribution is not yet recognized by MASS::fitdistr.

# Examples featured in Madden et al. (2007).
# p. 242-243
my_data <- incidence(dogwood_anthracnose)
my_data <- split(my_data, by = "t")
my_fit_two_distr <- lapply(my_data, fit_two_distr)
lapply(my_fit_two_distr, function(x) x$param$aggregated[c("prob", "theta"), ])
lapply(my_fit_two_distr, plot)

my_agg_index <- lapply(my_data, agg_index)
lapply(my_agg_index, function(x) x$index)
lapply(my_agg_index, chisq.test)

[Package epiphy version 0.5.0 Index]