R: Estimate Parameters of a Zero-Modified Lognormal (Delta)...

ezmlnorm {EnvStats}

R Documentation

Estimate Parameters of a Zero-Modified Lognormal (Delta) Distribution

Description

Estimate the parameters of a zero-modified lognormal distribution or a zero-modified lognormal distribution (alternative parameterization), and optionally construct a confidence interval for the mean.

Usage

  ezmlnorm(x, method = "mvue", ci = FALSE, ci.type = "two-sided", 
    ci.method = "normal.approx", conf.level = 0.95)

  ezmlnormAlt(x, method = "mvue", ci = FALSE, ci.type = "two-sided", 
    ci.method = "normal.approx", conf.level = 0.95)

Arguments

`x`	numeric vector of observations. Missing (`NA`), undefined (`NaN`), and infinite (`Inf`, `-Inf`) values are allowed but will be removed.
`method`	character string specifying the method of estimation. The only possible value is `"mvue"` (minimum variance unbiased; the default). See the DETAILS section for more information on this estimation method.
`ci`	logical scalar indicating whether to compute a confidence interval for the mean. The default value is `FALSE`. If `ci=TRUE` and there are less than three non-missing observations in `x`, or if all observations are zeros, a warning will be issued and no confidence interval will be computed.
`ci.type`	character string indicating what kind of confidence interval to compute. The possible values are `"two-sided"` (the default), `"lower"`, and `"upper"`. This argument is ignored if `ci=FALSE`.
`ci.method`	character string indicating what method to use to construct the confidence interval for the mean. The only possible value is `"normal.approx"` (the default). See the DETAILS section for more information. This argument is ignored if `ci=FALSE`.
`conf.level`	a scalar between 0 and 1 indicating the confidence level of the confidence interval. The default value is `conf.level=0.95`. This argument is ignored if `ci=FALSE`.

Details

If x contains any missing (NA), undefined (NaN) or infinite (Inf, -Inf) values, they will be removed prior to performing the estimation.

Let \underline{x} = (x_1, x_2, \ldots, x_n) be a vector of n observations from a zero-modified lognormal distribution with parameters meanlog=\mu, sdlog=\sigma, and p.zero=p. Alternatively, let \underline{x} = (x_1, x_2, \ldots, x_n) be a vector of n observations from a zero-modified lognormal distribution (alternative parameterization) with parameters mean=\theta, cv=\tau, and p.zero=p.

Let r denote the number of observations in \underline{x} that are equal to 0, and order the observations so that x_1, x_2, \ldots, x_r denote the r zero observations and x_{r+1}, x_{r+2}, \ldots, x_n denote the n-r non-zero observations.

Note that \theta is not the mean of the zero-modified lognormal distribution; it is the mean of the lognormal part of the distribution. Similarly, \tau is not the coefficient of variation of the zero-modified lognormal distribution; it is the coefficient of variation of the lognormal part of the distribution.

Let \gamma, \delta, and \phi denote the mean, standard deviation, and coefficient of variation of the overall zero-modified lognormal (delta) distribution. Let \eta denote the standard deviation of the lognormal part of the distribution, so that \eta = \theta \tau. Aitchison (1955) shows that:

\gamma = (1 - p) \theta \;\;\;\; (1)

\delta^2 = (1 - p) \eta^2 + p (1 - p) \theta^2 \;\;\;\; (2)

so that

\phi = \frac{\delta}{\gamma} = \frac{\sqrt{\tau^2 + p}}{\sqrt{1-p}} \;\;\;\; (3)

Estimation

Minimum Variance Unbiased Estimation (method="mvue")
Aitchison (1955) shows that the minimum variance unbiased estimators (mvue's) of \gamma and \delta are:

`\hat{\gamma}_{mvue} =`	`(1-\frac{r}{n}) e^{\bar{y}} g_{n-r-1}(\frac{s^2}{2})`	if `r < n - 1`,
	`x_n / n`	if `r = n - 1`,
	`0`	if `r = n \;\;\;\; (4)`

`\hat{\delta}^2_{mvue} =`	`(1-\frac{r}{n}) e^{2\bar{y}} \{g_{n-r-1}(2s^2) - \frac{n-r-1}{n-1} g_{n-r-1}[\frac{(n-r-2)s^2}{n-r-1}] \}`	if `r < n - 1`,
	`x_n^2 / n`	if `r = n - 1`,
	`0`	if `r = n \;\;\;\; (5)`

where

y_i = log(x_i), \; r = r+1, r+2, \ldots, n \;\;\;\; (6)

\bar{y} = \frac{1}{n-r} \sum_{i=r+1}^n y_i \;\;\;\; (7)

s^2 = \frac{1}{n-r-1} \sum_{i=r+1}^n (y_i - \bar{y})^2 \;\;\;\; (8)

g_m(z) = \sum_{i=0}^\infty \frac{m^i (m+2i)}{m(m+2) \cdots (m+2i)} (\frac{m}{m+1})^i (\frac{z^i}{i!}) \;\;\;\; (9)

Note that when r=n-1 or r=n, the estimator of \gamma is simply the sample mean for all observations (including zero values), and the estimator for \delta^2 is simply the sample variance for all observations.

The expected value and asymptotic variance of the mvue of \gamma are (Aitchison and Brown, 1957, p.99; Owen and DeRouen, 1980):

E(\hat{\gamma}_{mvue}) = \gamma \;\;\;\; (10)

AVar(\hat{\gamma}_{mvue}) = \frac{1}{n} exp(2\mu + \sigma^2) (1-p) (p + \frac{2\sigma^2 + \sigma^4}{2}) \;\;\;\; (11)

Confidence Intervals

Based on Normal Approximation (ci.method="normal.approx")
An approximate (1-\alpha)100\% confidence interval for \gamma is constructed based on the assumption that the estimator of \gamma is approximately normally distributed. Thus, an approximate two-sided (1-\alpha)100\% confidence interval for \gamma is constructed as:

[ \hat{\gamma}_{mvue} - t_{n-2, 1-\alpha/2} \hat{\sigma}_{\hat{\gamma}}, \; \hat{\gamma}_{mvue} + t_{n-2, 1-\alpha/2} \hat{\sigma}_{\hat{\gamma}} ] \;\;\;\; (12)

where t_{\nu, p} is the p'th quantile of Student's t-distribution with \nu degrees of freedom, and the quantity \hat{\sigma}_{\hat{\gamma}} is the estimated standard deviation of the mvue of \gamma, and is computed by replacing the values of \mu, \sigma, and p in equation (11) above with their estimated values and taking the square root.

Note that there must be at least 3 non-missing observations (n \ge 3) and at least one observation must be non-zero (r \le n-1) in order to construct a confidence interval.

One-sided confidence intervals are computed in a similar fashion.

Value

a list of class "estimate" containing the estimated parameters and other information. See
estimate.object for details.

For the function ezmlnorm, the component called parameters is a numeric vector with the following estimated parameters:

Parameter Name	Explanation
`meanlog`	mean of the log of the lognormal part of the distribution.
`sdlog`	standard deviation of the log of the lognormal part of the distribution.
`p.zero`	probability that an observation will be 0.
`mean.zmlnorm`	mean of the overall zero-modified lognormal (delta) distribution.
`sd.zmlnorm`	standard deviation of the overall zero-modified lognormal (delta) distribution.

For the function ezmlnormAlt, the component called parameters is a numeric vector with the following estimated parameters:

Parameter Name	Explanation
`mean`	mean of the lognormal part of the distribution.
`cv`	coefficient of variation of the lognormal part of the distribution.
`p.zero`	probability that an observation will be 0.
`mean.zmlnorm`	mean of the overall zero-modified lognormal (delta) distribution.
`sd.zmlnorm`	standard deviation of the overall zero-modified lognormal (delta) distribution.

Note

The zero-modified lognormal (delta) distribution is sometimes used to model chemical concentrations for which some observations are reported as “Below Detection Limit” (the nondetects are assumed equal to 0). See, for example, Gilliom and Helsel (1986), Owen and DeRouen (1980), and Gibbons et al. (2009, Chapter 12). USEPA (2009, Chapter 15) recommends this strategy only in specific situations, and Helsel (2012, Chapter 1) strongly discourages this approach to dealing with non-detects.

A variation of the zero-modified lognormal (delta) distribution is the zero-modified normal distribution, in which a normal distribution is mixed with a positive probability mass at 0.

One way to try to assess whether a zero-modified lognormal (delta), zero-modified normal, censored normal, or censored lognormal is the best model for the data is to construct both censored and detects-only probability plots (see qqPlotCensored).

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Aitchison, J. (1955). On the Distribution of a Positive Random Variable Having a Discrete Probability Mass at the Origin. Journal of the American Statistical Association 50, 901–908.

Aitchison, J., and J.A.C. Brown (1957). The Lognormal Distribution (with special reference to its uses in economics). Cambridge University Press, London. pp.94-99.

Crow, E.L., and K. Shimizu. (1988). Lognormal Distributions: Theory and Applications. Marcel Dekker, New York, pp.47–51.

Gibbons, RD., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring. Second Edition. John Wiley and Sons, Hoboken, NJ.

Gilliom, R.J., and D.R. Helsel. (1986). Estimation of Distributional Parameters for Censored Trace Level Water Quality Data: 1. Estimation Techniques. Water Resources Research 22, 135–146.

Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R. Second Edition. John Wiley and Sons, Hoboken, NJ, Chapter 1.

Johnson, N. L., S. Kotz, and A.W. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, p.312.

Owen, W., and T. DeRouen. (1980). Estimation of the Mean for Lognormal Data Containing Zeros and Left-Censored Values, with Applications to the Measurement of Worker Exposure to Air Contaminants. Biometrics 36, 707–719.

USEPA (1992c). Statistical Analysis of Ground-Water Monitoring Data at RCRA Facilities: Addendum to Interim Final Guidance. Office of Solid Waste, Permits and State Programs Division, US Environmental Protection Agency, Washington, D.C.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.

Examples

  # Generate 100 observations from a zero-modified lognormal (delta) 
  # distribution with mean=2, cv=1, and p.zero=0.5, then estimate the 
  # parameters. According to equations (1) and (3) above, the overall mean 
  # is mean.zmlnorm=1 and the overall cv is cv.zmlnorm=sqrt(3). 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(250) 
  dat <- rzmlnormAlt(100, mean = 2, cv = 1, p.zero = 0.5) 
  ezmlnormAlt(dat, ci = TRUE) 

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Zero-Modified Lognormal (Delta)
  #
  #Estimated Parameter(s):          mean         = 1.9604561
  #                                 cv           = 0.9169411
  #                                 p.zero       = 0.4500000
  #                                 mean.zmlnorm = 1.0782508
  #                                 cv.zmlnorm   = 1.5307175
  #
  #Estimation Method:               mvue
  #
  #Data:                            dat
  #
  #Sample Size:                     100
  #
  #Confidence Interval for:         mean.zmlnorm
  #
  #Confidence Interval Method:      Normal Approximation
  #                                 (t Distribution)
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL = 0.748134
  #                                 UCL = 1.408368

  #----------

  # Clean up
  rm(dat)

[Package EnvStats version 2.8.1 Index]