fit_two_distr {epiphy} | R Documentation |
Maximum likelihood fitting of two distributions and goodness-of-fit comparison.
Description
Different distributions may be used depending on the kind of provided data. By default, the Poisson and negative binomial distributions are fitted to count data, whereas the binomial and beta-binomial distributions are used with incidence data. Either Randomness assumption (Poisson or binomial distributions) or aggregation assumption (negative binomial or beta-binomial) are made, and then, a goodness-of-fit comparison of both distributions is made using a log-likelihood ratio test.
Usage
fit_two_distr(data, ...)
## Default S3 method:
fit_two_distr(data, random, aggregated, ...)
## S3 method for class 'count'
fit_two_distr(
data,
random = smle_pois,
aggregated = smle_nbinom,
n_est = c(random = 1, aggregated = 2),
...
)
## S3 method for class 'incidence'
fit_two_distr(
data,
random = smle_binom,
aggregated = smle_betabinom,
n_est = c(random = 1, aggregated = 2),
...
)
Arguments
data |
An |
... |
Additional arguments to be passed to other methods. |
random |
Distribution to describe random patterns. |
aggregated |
Distribution to describe aggregated patterns. |
n_est |
Number of estimated parameters for both distributions. |
Details
Under the hood, distr_fit
relies on the smle
utility
which is a wrapped around the optim
procedure.
Note that there may appear warnings about chi-squared goodness-of-fit tests if any expected count is less than 5 (Cochran's rule of thumb).
Value
An object of class fit_two_distr
, which is a list containing at least
the following components:
call | The function call . |
name | The names of both distributions. |
model | The outputs of fitting process for both distributions. |
llr | The result of the log-likelihood ratio test. |
Other components can be present such as:
param | A numeric matrix of estimated parameters (that can be
printed using printCoefmat ). |
freq | A data frame or a matrix with the observed and expected frequencies for both distributions for the different categories. |
gof | Goodness-of-fit tests for both distributions (which are typically chi-squared goodness-of-fit tests). |
References
Madden LV, Hughes G. 1995. Plant disease incidence: Distributions, heterogeneity, and temporal analysis. Annual Review of Phytopathology 33(1): 529–564. doi:10.1146/annurev.py.33.090195.002525
Examples
# Simple workflow for incidence data:
my_data <- count(arthropods)
my_data <- split(my_data, by = "t")[[3]]
my_res <- fit_two_distr(my_data)
summary(my_res)
plot(my_res)
# Simple workflow for incidence data:
my_data <- incidence(tobacco_viruses)
my_res <- fit_two_distr(my_data)
summary(my_res)
plot(my_res)
# Note that there are other methods to fit some common distributions.
# For example for the Poisson distribution, one can use glm:
my_arthropods <- arthropods[arthropods$t == 3, ]
my_model <- glm(my_arthropods$i ~ 1, family = poisson)
lambda <- exp(coef(my_model)[[1]]) # unique(my_model$fitted.values) works also.
lambda
# ... or the fitdistr function in MASS package:
require(MASS)
fitdistr(my_arthropods$i, "poisson")
# For the binomial distribution, glm still works:
my_model <- with(tobacco_viruses, glm(i/n ~ 1, family = binomial, weights = n))
prob <- logit(coef(my_model)[[1]], rev = TRUE)
prob
# ... but the binomial distribution is not yet recognized by MASS::fitdistr.
# Examples featured in Madden et al. (2007).
# p. 242-243
my_data <- incidence(dogwood_anthracnose)
my_data <- split(my_data, by = "t")
my_fit_two_distr <- lapply(my_data, fit_two_distr)
lapply(my_fit_two_distr, function(x) x$param$aggregated[c("prob", "theta"), ])
lapply(my_fit_two_distr, plot)
my_agg_index <- lapply(my_data, agg_index)
lapply(my_agg_index, function(x) x$index)
lapply(my_agg_index, chisq.test)