ezmlnorm {EnvStats}  R Documentation 
Estimate Parameters of a ZeroModified Lognormal (Delta) Distribution
Description
Estimate the parameters of a zeromodified lognormal distribution or a zeromodified lognormal distribution (alternative parameterization), and optionally construct a confidence interval for the mean.
Usage
ezmlnorm(x, method = "mvue", ci = FALSE, ci.type = "twosided",
ci.method = "normal.approx", conf.level = 0.95)
ezmlnormAlt(x, method = "mvue", ci = FALSE, ci.type = "twosided",
ci.method = "normal.approx", conf.level = 0.95)
Arguments
x 
numeric vector of observations. Missing ( 
method 
character string specifying the method of estimation. The only possible value is

ci 
logical scalar indicating whether to compute a confidence interval for the
mean. The default value is 
ci.type 
character string indicating what kind of confidence interval to compute. The
possible values are 
ci.method 
character string indicating what method to use to construct the confidence
interval for the mean. The only possible value is 
conf.level 
a scalar between 0 and 1 indicating the confidence level of the confidence interval.
The default value is 
Details
If x
contains any missing (NA
), undefined (NaN
) or
infinite (Inf
, Inf
) values, they will be removed prior to
performing the estimation.
Let \underline{x} = (x_1, x_2, \ldots, x_n)
be a vector of
n
observations from a
zeromodified lognormal distribution with
parameters meanlog=
\mu
, sdlog=
\sigma
, and
p.zero=
p
. Alternatively, let
\underline{x} = (x_1, x_2, \ldots, x_n)
be a vector of
n
observations from a
zeromodified lognormal distribution
(alternative parameterization) with parameters mean=
\theta
,
cv=
\tau
, and p.zero=
p
.
Let r
denote the number of observations in \underline{x}
that are equal
to 0, and order the observations so that x_1, x_2, \ldots, x_r
denote
the r
zero observations and x_{r+1}, x_{r+2}, \ldots, x_n
denote
the nr
nonzero observations.
Note that \theta
is not the mean of the zeromodified lognormal
distribution; it is the mean of the lognormal part of the distribution. Similarly,
\tau
is not the coefficient of variation of the zeromodified
lognormal distribution; it is the coefficient of variation of the lognormal
part of the distribution.
Let \gamma
, \delta
, and \phi
denote the mean, standard deviation,
and coefficient of variation of the overall zeromodified lognormal (delta)
distribution. Let \eta
denote the standard deviation of the lognormal
part of the distribution, so that \eta = \theta \tau
. Aitchison (1955)
shows that:
\gamma = (1  p) \theta \;\;\;\; (1)
\delta^2 = (1  p) \eta^2 + p (1  p) \theta^2 \;\;\;\; (2)
so that
\phi = \frac{\delta}{\gamma} = \frac{\sqrt{\tau^2 + p}}{\sqrt{1p}} \;\;\;\; (3)
Estimation
Minimum Variance Unbiased Estimation (method="mvue"
)
Aitchison (1955) shows that the minimum variance unbiased estimators (mvue's) of
\gamma
and \delta
are:
\hat{\gamma}_{mvue} =  (1\frac{r}{n}) e^{\bar{y}} g_{nr1}(\frac{s^2}{2})  if r < n  1 , 
x_n / n  if r = n  1 , 

0  if r = n \;\;\;\; (4) 

\hat{\delta}^2_{mvue} =  (1\frac{r}{n}) e^{2\bar{y}} \{g_{nr1}(2s^2)  \frac{nr1}{n1} g_{nr1}[\frac{(nr2)s^2}{nr1}] \}  if r < n  1 , 
x_n^2 / n  if r = n  1 , 

0  if r = n \;\;\;\; (5)

where
y_i = log(x_i), \; r = r+1, r+2, \ldots, n \;\;\;\; (6)
\bar{y} = \frac{1}{nr} \sum_{i=r+1}^n y_i \;\;\;\; (7)
s^2 = \frac{1}{nr1} \sum_{i=r+1}^n (y_i  \bar{y})^2 \;\;\;\; (8)
g_m(z) = \sum_{i=0}^\infty \frac{m^i (m+2i)}{m(m+2) \cdots (m+2i)} (\frac{m}{m+1})^i (\frac{z^i}{i!}) \;\;\;\; (9)
Note that when r=n1
or r=n
, the estimator of \gamma
is simply the
sample mean for all observations (including zero values), and the estimator for
\delta^2
is simply the sample variance for all observations.
The expected value and asymptotic variance of the mvue of \gamma
are
(Aitchison and Brown, 1957, p.99; Owen and DeRouen, 1980):
E(\hat{\gamma}_{mvue}) = \gamma \;\;\;\; (10)
AVar(\hat{\gamma}_{mvue}) = \frac{1}{n} exp(2\mu + \sigma^2) (1p) (p + \frac{2\sigma^2 + \sigma^4}{2}) \;\;\;\; (11)
Confidence Intervals
Based on Normal Approximation (ci.method="normal.approx"
)
An approximate (1\alpha)100\%
confidence interval for \gamma
is
constructed based on the assumption that the estimator of \gamma
is
approximately normally distributed. Thus, an approximate twosided
(1\alpha)100\%
confidence interval for \gamma
is constructed as:
[ \hat{\gamma}_{mvue}  t_{n2, 1\alpha/2} \hat{\sigma}_{\hat{\gamma}}, \; \hat{\gamma}_{mvue} + t_{n2, 1\alpha/2} \hat{\sigma}_{\hat{\gamma}} ] \;\;\;\; (12)
where t_{\nu, p}
is the p
'th quantile of
Student's tdistribution with \nu
degrees of freedom, and
the quantity \hat{\sigma}_{\hat{\gamma}}
is the estimated standard deviation
of the mvue of \gamma
, and is computed by replacing the values of
\mu
, \sigma
, and p
in equation (11) above with their estimated
values and taking the square root.
Note that there must be at least 3 nonmissing observations (n \ge 3
) and
at least one observation must be nonzero (r \le n1
) in order to construct
a confidence interval.
Onesided confidence intervals are computed in a similar fashion.
Value
a list of class "estimate"
containing the estimated parameters and other information.
See
estimate.object
for details.
For the function ezmlnorm
, the component called parameters
is a
numeric vector with the following estimated parameters:
Parameter Name  Explanation 
meanlog  mean of the log of the lognormal part of the distribution. 
sdlog  standard deviation of the log of the lognormal part of the distribution. 
p.zero  probability that an observation will be 0. 
mean.zmlnorm  mean of the overall zeromodified lognormal (delta) distribution. 
sd.zmlnorm  standard deviation of the overall zeromodified lognormal (delta) distribution. 
For the function ezmlnormAlt
, the component called parameters
is a
numeric vector with the following estimated parameters:
Parameter Name  Explanation 
mean  mean of the lognormal part of the distribution. 
cv  coefficient of variation of the lognormal part of the distribution. 
p.zero  probability that an observation will be 0. 
mean.zmlnorm  mean of the overall zeromodified lognormal (delta) distribution. 
sd.zmlnorm  standard deviation of the overall zeromodified lognormal (delta) distribution. 
Note
The zeromodified lognormal (delta) distribution is sometimes used to model chemical concentrations for which some observations are reported as “Below Detection Limit” (the nondetects are assumed equal to 0). See, for example, Gilliom and Helsel (1986), Owen and DeRouen (1980), and Gibbons et al. (2009, Chapter 12). USEPA (2009, Chapter 15) recommends this strategy only in specific situations, and Helsel (2012, Chapter 1) strongly discourages this approach to dealing with nondetects.
A variation of the zeromodified lognormal (delta) distribution is the zeromodified normal distribution, in which a normal distribution is mixed with a positive probability mass at 0.
One way to try to assess whether a zeromodified lognormal (delta),
zeromodified normal, censored normal, or censored lognormal is the best
model for the data is to construct both censored and detectsonly probability
plots (see qqPlotCensored
).
Author(s)
Steven P. Millard (EnvStats@ProbStatInfo.com)
References
Aitchison, J. (1955). On the Distribution of a Positive Random Variable Having a Discrete Probability Mass at the Origin. Journal of the American Statistical Association 50, 901–908.
Aitchison, J., and J.A.C. Brown (1957). The Lognormal Distribution (with special reference to its uses in economics). Cambridge University Press, London. pp.9499.
Crow, E.L., and K. Shimizu. (1988). Lognormal Distributions: Theory and Applications. Marcel Dekker, New York, pp.47–51.
Gibbons, RD., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring. Second Edition. John Wiley and Sons, Hoboken, NJ.
Gilliom, R.J., and D.R. Helsel. (1986). Estimation of Distributional Parameters for Censored Trace Level Water Quality Data: 1. Estimation Techniques. Water Resources Research 22, 135–146.
Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R. Second Edition. John Wiley and Sons, Hoboken, NJ, Chapter 1.
Johnson, N. L., S. Kotz, and A.W. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, p.312.
Owen, W., and T. DeRouen. (1980). Estimation of the Mean for Lognormal Data Containing Zeros and LeftCensored Values, with Applications to the Measurement of Worker Exposure to Air Contaminants. Biometrics 36, 707–719.
USEPA (1992c). Statistical Analysis of GroundWater Monitoring Data at RCRA Facilities: Addendum to Interim Final Guidance. Office of Solid Waste, Permits and State Programs Division, US Environmental Protection Agency, Washington, D.C.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R09007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.
See Also
ZeroModified Lognormal, ZeroModified Normal, Lognormal.
Examples
# Generate 100 observations from a zeromodified lognormal (delta)
# distribution with mean=2, cv=1, and p.zero=0.5, then estimate the
# parameters. According to equations (1) and (3) above, the overall mean
# is mean.zmlnorm=1 and the overall cv is cv.zmlnorm=sqrt(3).
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(250)
dat < rzmlnormAlt(100, mean = 2, cv = 1, p.zero = 0.5)
ezmlnormAlt(dat, ci = TRUE)
#Results of Distribution Parameter Estimation
#
#
#Assumed Distribution: ZeroModified Lognormal (Delta)
#
#Estimated Parameter(s): mean = 1.9604561
# cv = 0.9169411
# p.zero = 0.4500000
# mean.zmlnorm = 1.0782508
# cv.zmlnorm = 1.5307175
#
#Estimation Method: mvue
#
#Data: dat
#
#Sample Size: 100
#
#Confidence Interval for: mean.zmlnorm
#
#Confidence Interval Method: Normal Approximation
# (t Distribution)
#
#Confidence Interval Type: twosided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 0.748134
# UCL = 1.408368
#
# Clean up
rm(dat)