R: Estimate Parameters of a Zero-Modified Normal Distribution

ezmnorm {EnvStats}

R Documentation

Estimate Parameters of a Zero-Modified Normal Distribution

Description

Estimate the mean and standard deviation of a zero-modified normal distribution, and optionally construct a confidence interval for the mean.

Usage

  ezmnorm(x, method = "mvue", ci = FALSE, ci.type = "two-sided", 
    ci.method = "normal.approx", conf.level = 0.95)

Arguments

`x`	numeric vector of observations.
`method`	character string specifying the method of estimation. Currently, the only possible value is `"mvue"` (minimum variance unbiased; the default). See the DETAILS section for more information.
`ci`	logical scalar indicating whether to compute a confidence interval for the mean. The default value is `FALSE`.
`ci.type`	character string indicating what kind of confidence interval to compute. The possible values are `"two-sided"` (the default), `"lower"`, and `"upper"`. This argument is ignored if `ci=FALSE`.
`ci.method`	character string indicating what method to use to construct the confidence interval for the mean. Currently the only possible value is `"normal.approx"` (the default). See the DETAILS section for more information.
`conf.level`	a scalar between 0 and 1 indicating the confidence level of the confidence interval. The default value is `conf.level=0.95`. This argument is ignored if `ci=FALSE`.

Details

If x contains any missing (NA), undefined (NaN) or infinite (Inf, -Inf) values, they will be removed prior to performing the estimation.

Let \underline{x} = (x_1, x_2, \ldots, x_n) be a vector of n observations from a zero-modified normal distribution with parameters mean=\mu, sd=\sigma, and p.zero=p. Let r denote the number of observations in \underline{x} that are equal to 0, and order the observations so that x_1, x_2, \ldots, x_r denote the r zero observations, and x_{r+1}, x_{r+2}, \ldots, x_n denote the n-r non-zero observations.

Note that \mu is not the mean of the zero-modified normal distribution; it is the mean of the normal part of the distribution. Similarly, \sigma is not the standard deviation of the zero-modified normal distribution; it is the standard deviation of the normal part of the distribution.

Let \gamma and \delta denote the mean and standard deviation of the overall zero-modified normal distribution. Aitchison (1955) shows that:

\gamma = (1 - p) \mu \;\;\;\; (1)

\delta^2 = (1 - p) \sigma^2 + p (1 - p) \mu^2 \;\;\;\; (2)

Estimation

Minimum Variance Unbiased Estimation (method="mvue")
Aitchison (1955) shows that the minimum variance unbiased estimators (mvue's) of \gamma and \delta are:

\hat{\gamma}_{mvue} = \bar{x} \;\;\;\; (3)

`\hat{\delta}^2_{mvue} =`	`\frac{n-r-1}{n-1} (s^)^2 + \frac{r}{n} (\frac{n-r}{n-1}) (\bar{x}^)^2`	if `r < n - 1`,
	`x_n^2 / n`	if `r = n - 1`,
	`0`	if `r = n \;\;\;\; (4)`

where

\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \;\;\;\; (5)

\bar{x}^* = \frac{1}{n-r} \sum_{i=r+1}^n x_i \;\;\;\; (6)

(s^*)^2 = \frac{1}{n-r-1} \sum_{i=r+1}^n (x_i - \bar{x}^*)^2 \;\;\;\; (7)

Note that the quantity in equation (5) is the sample mean of all observations (including 0 values), the quantity in equation (6) is the sample mean of all non-zero observations, and the quantity in equation (7) is the sample variance of all non-zero observations. Also note that for r=n-1 or r=n, the estimator of \delta^2 is the sample variance for all observations (including 0 values).

Confidence Intervals

Based on Normal Approximation (ci.method="normal.approx")
An approximate (1-\alpha)100\% confidence interval for \gamma is constructed based on the assumption that the estimator of \gamma is approximately normally distributed. Aitchison (1955) shows that

Var(\hat{\gamma}_{mvue}) = Var(\bar{x}) = \frac{\delta^2}{n} \;\;\;\; (8)

Thus, an approximate two-sided (1-\alpha)100\% confidence interval for \gamma is constructed as:

[ \hat{\gamma}_{mvue} - t_{n-2, 1-\alpha/2} \frac{\hat{\delta}_{mvue}}{\sqrt{n}}, \; \hat{\gamma}_{mvue} + t_{n-2, 1-\alpha/2} \frac{\hat{\delta}_{mvue}}{\sqrt{n}} ] \;\;\;\; (9)

where t_{\nu, p} is the p'th quantile of Student's t-distribution with \nu degrees of freedom.

One-sided confidence intervals are computed in a similar fashion.

Value

a list of class "estimate" containing the estimated parameters and other information. See
estimate.object for details.

The component called parameters is a numeric vector with the following estimated parameters:

Parameter Name	Explanation
`mean`	mean of the normal (Gaussian) part of the distribution.
`sd`	standard deviation of the normal (Gaussian) part of the distribution.
`p.zero`	probability that an observation will be 0.
`mean.zmnorm`	mean of the overall zero-modified normal distribution.
`sd.zmnorm`	standard deviation of the overall normal distribution.

Note

The zero-modified normal distribution is sometimes used to model chemical concentrations for which some observations are reported as “Below Detection Limit”. See, for example USEPA (1992c, pp.27-34). In most cases, however, the zero-modified lognormal (delta) distribution will be more appropriate, since chemical concentrations are bounded below at 0 (e.g., Gilliom and Helsel, 1986; Owen and DeRouen, 1980).

Once you estimate the parameters of the zero-modified normal distribution, it is often useful to characterize the uncertainty in the estimate of the mean. This is done with a confidence interval.

One way to try to assess whether a zero-modified lognormal (delta), zero-modified normal, censored normal, or censored lognormal is the best model for the data is to construct both censored and detects-only probability plots (see qqPlotCensored).

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Aitchison, J. (1955). On the Distribution of a Positive Random Variable Having a Discrete Probability Mass at the Origin. Journal of the American Statistical Association 50, 901–908.

Gilliom, R.J., and D.R. Helsel. (1986). Estimation of Distributional Parameters for Censored Trace Level Water Quality Data: 1. Estimation Techniques. Water Resources Research 22, 135–146.

Owen, W., and T. DeRouen. (1980). Estimation of the Mean for Lognormal Data Containing Zeros and Left-Censored Values, with Applications to the Measurement of Worker Exposure to Air Contaminants. Biometrics 36, 707–719.

USEPA (1992c). Statistical Analysis of Ground-Water Monitoring Data at RCRA Facilities: Addendum to Interim Final Guidance. Office of Solid Waste, Permits and State Programs Division, US Environmental Protection Agency, Washington, D.C.

Examples

  # Generate 100 observations from a zero-modified normal distribution 
  # with mean=4, sd=2, and p.zero=0.5, then estimate the parameters.  
  # According to equations (1) and (2) above, the overall mean is 
  # mean.zmnorm=2 and the overall standard deviation is sd.zmnorm=sqrt(6).  
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(250) 
  dat <- rzmnorm(100, mean = 4, sd = 2, p.zero = 0.5) 
  ezmnorm(dat, ci = TRUE) 

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Zero-Modified Normal
  #
  #Estimated Parameter(s):          mean        = 4.037732
  #                                 sd          = 1.917004
  #                                 p.zero      = 0.450000
  #                                 mean.zmnorm = 2.220753
  #                                 sd.zmnorm   = 2.465829
  #
  #Estimation Method:               mvue
  #
  #Data:                            dat
  #
  #Sample Size:                     100
  #
  #Confidence Interval for:         mean.zmnorm
  #
  #Confidence Interval Method:      Normal Approximation
  #                                 (t Distribution)
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL = 1.731417
  #                                 UCL = 2.710088

  #----------

  # Following Example 9 on page 34 of USEPA (1992c), compute an 
  # estimate of the mean of the zinc data, assuming a 
  # zero-modified normal distribution. The data are stored in 
  # EPA.92c.zinc.df.

  head(EPA.92c.zinc.df) 
  #  Zinc.orig  Zinc Censored Sample Well
  #1        <7  7.00     TRUE      1    1
  #2     11.41 11.41    FALSE      2    1
  #3        <7  7.00     TRUE      3    1
  #4        <7  7.00     TRUE      4    1
  #5        <7  7.00     TRUE      5    1
  #6     10.00 10.00    FALSE      6    1

  New.Zinc <- EPA.92c.zinc.df$Zinc 
  New.Zinc[EPA.92c.zinc.df$Censored] <- 0 
  ezmnorm(New.Zinc, ci = TRUE) 

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Zero-Modified Normal
  #
  #Estimated Parameter(s):          mean        = 11.891000
  #                                 sd          =  1.594523
  #                                 p.zero      =  0.500000
  #                                 mean.zmnorm =  5.945500
  #                                 sd.zmnorm   =  6.123235
  #
  #Estimation Method:               mvue
  #
  #Data:                            New.Zinc
  #
  #Sample Size:                     40
  #
  #Confidence Interval for:         mean.zmnorm
  #
  #Confidence Interval Method:      Normal Approximation
  #                                 (t Distribution)
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL = 3.985545
  #                                 UCL = 7.905455

  #----------

  # Clean up
  rm(dat, New.Zinc)

[Package EnvStats version 2.8.1 Index]