R: Estimate Parameters of a Normal (Gaussian) Distribution

enorm {EnvStats}

R Documentation

Estimate Parameters of a Normal (Gaussian) Distribution

Description

Estimate the mean and standard deviation parameters of a normal (Gaussian) distribution, and optionally construct a confidence interval for the mean or the variance.

Usage

  enorm(x, method = "mvue", ci = FALSE, ci.type = "two-sided", 
    ci.method = "exact", conf.level = 0.95, ci.param = "mean")

Arguments

`x`	numeric vector of observations.
`method`	character string specifying the method of estimation. Possible values are `"mvue"` (minimum variance unbiased; the default), and `"mle/mme"` (maximum likelihood/method of moments). See the DETAILS section for more information on these estimation methods.
`ci`	logical scalar indicating whether to compute a confidence interval for the mean or variance. The default value is `FALSE`.
`ci.type`	character string indicating what kind of confidence interval to compute. The possible values are `"two-sided"` (the default), `"lower"`, and `"upper"`. This argument is ignored if `ci=FALSE`.
`ci.method`	character string indicating what method to use to construct the confidence interval for the mean or variance. The only possible value is `"exact"` (the default). See the DETAILS section for more information. This argument is ignored if `ci=FALSE`.
`conf.level`	a scalar between 0 and 1 indicating the confidence level of the confidence interval. The default value is `conf.level=0.95`. This argument is ignored if `ci=FALSE`.
`ci.param`	character string indicating which parameter to create a confidence interval for. The possible values are `ci.param="mean"` (the default) and `ci.param="variance"`. This argument is ignored if `ci=FALSE`.

Details

If x contains any missing (NA), undefined (NaN) or infinite (Inf, -Inf) values, they will be removed prior to performing the estimation.

Let \underline{x} = (x_1, x_2, \ldots, x_n) be a vector of n observations from an normal (Gaussian) distribution with parameters mean=\mu and sd=\sigma.

Estimation

Minimum Variance Unbiased Estimation (method="mvue")
The minimum variance unbiased estimators (mvue's) of the mean and variance are:

\hat{\mu}_{mvue} = \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \;\;\;\; (1)

\hat{\sigma}^2_{mvue} = s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2 \;\;\;\; (2)

(Johnson et al., 1994; Forbes et al., 2011). Note that when method="mvue", the estimated standard deviation is the square root of the mvue of the variance, but is not itself an mvue.

Maximum Likelihood/Method of Moments Estimation (method="mle/mme")
The maximum likelihood estimator (mle) and method of moments estimator (mme) of the mean are both the same as the mvue of the mean given in equation (1) above. The mle and mme of the variance is given by:

\hat{\sigma}^2_{mle} = s^2_m = \frac{n-1}{n} s^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2 \;\;\;\; (3)

When method="mle/mme", the estimated standard deviation is the square root of the mle of the variance, and is itself an mle.

Confidence Intervals

Confidence Interval for the Mean (ci.param="mean")
When ci=TRUE and ci.param = "mean", the usual confidence interval for \mu is constructed as follows. If ci.type="two-sided", a the (1-\alpha)100% confidence interval for \mu is given by:

[\hat{\mu} - t(n-1, 1-\alpha/2) \frac{\hat{\sigma}}{\sqrt{n}}, \, \hat{\mu} + t(n-1, 1-\alpha/2) \frac{\hat{\sigma}}{\sqrt{n}}] \;\;\;\; (4)

where t(\nu, p) is the p'th quantile of Student's t-distribution with \nu degrees of freedom (Zar, 2010; Gilbert, 1987; Ott, 1995; Helsel and Hirsch, 1992).

If ci.type="lower", the (1-\alpha)100% confidence interval for \mu is given by:

[\hat{\mu} - t(n-1, 1-\alpha) \frac{\hat{\sigma}}{\sqrt{n}}, \, \infty] \;\;\;\; (5)

and if ci.type="upper", the confidence interval is given by:

[-\infty, \, \hat{\mu} + t(n-1, 1-\alpha/2) \frac{\hat{\sigma}}{\sqrt{n}}] \;\;\;\; (6)

Confidence Interval for the Variance (ci.param="variance")
When ci=TRUE and ci.param = "variance", the usual confidence interval for \sigma^2 is constructed as follows. A two-sided (1-\alpha)100% confidence interval for \sigma^2 is given by:

[ \frac{(n-1)s^2}{\chi^2_{n-1,1-\alpha/2}}, \, \frac{(n-1)s^2}{\chi^2_{n-1,\alpha/2}} ] \;\;\;\; (7)

Similarly, a one-sided upper (1-\alpha)100% confidence interval for the population variance is given by:

[ 0, \, \frac{(n-1)s^2}{\chi^2_{n-1,\alpha}} ] \;\;\;\; (8)

and a one-sided lower (1-\alpha)100% confidence interval for the population variance is given by:

[ \frac{(n-1)s^2}{\chi^2_{n-1,1-\alpha}}, \, \infty ] \;\;\;\; (9)

(van Belle et al., 2004; Zar, 2010).

Value

a list of class "estimate" containing the estimated parameters and other information. See
estimate.object for details.

Note

The normal and lognormal distribution are probably the two most frequently used distributions to model environmental data. In order to make any kind of probability statement about a normally-distributed population (of chemical concentrations for example), you have to first estimate the mean and standard deviation (the population parameters) of the distribution. Once you estimate these parameters, it is often useful to characterize the uncertainty in the estimate of the mean or variance. This is done with confidence intervals.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Second Edition. Lewis Publishers, Boca Raton, FL.

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York, NY.

Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY, Chapter 7.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York.

Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL.

Ott, W.R. (1995). Environmental Statistics and Data Analysis. Lewis Publishers, Boca Raton, FL.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.

van Belle, G., L.D. Fisher, Heagerty, P.J., and Lumley, T. (2004). Biostatistics: A Methodology for the Health Sciences, 2nd Edition. John Wiley & Sons, New York.

Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ.

Examples

  # Generate 20 observations from a N(3, 2) distribution, then estimate 
  # the parameters and create a 95% confidence interval for the mean. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(250) 
  dat <- rnorm(20, mean = 3, sd = 2) 
  enorm(dat, ci = TRUE) 

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Normal
  #
  #Estimated Parameter(s):          mean = 2.861160
  #                                 sd   = 1.180226
  #
  #Estimation Method:               mvue
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Confidence Interval for:         mean
  #
  #Confidence Interval Method:      Exact
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL = 2.308798
  #                                 UCL = 3.413523

  #----------

  # Using the same data, construct an upper 90% confidence interval for
  # the variance.

  enorm(dat, ci = TRUE, ci.type = "upper", ci.param = "variance")$interval

  #Confidence Interval for:         variance
  #
  #Confidence Interval Method:      Exact
  #
  #Confidence Interval Type:        upper
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL = 0.000000
  #                                 UCL = 2.615963

  #----------

  # Clean up
  #---------
  rm(dat)

  #----------

  # Using the Reference area TcCB data in the data frame EPA.94b.tccb.df, 
  # estimate the mean and standard deviation of the log-transformed data, 
  # and construct a 95% confidence interval for the mean.

  with(EPA.94b.tccb.df, enorm(log(TcCB[Area == "Reference"]), ci = TRUE))  

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Normal
  #
  #Estimated Parameter(s):          mean = -0.6195712
  #                                 sd   =  0.4679530
  #
  #Estimation Method:               mvue
  #
  #Data:                            log(TcCB[Area == "Reference"])
  #
  #Sample Size:                     47
  #
  #Confidence Interval for:         mean
  #
  #Confidence Interval Method:      Exact
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL = -0.7569673
  #                                 UCL = -0.4821751

[Package EnvStats version 2.8.1 Index]