R: Half-Width of Confidence Interval for Normal Distribution...

ciNormHalfWidth {EnvStats}

R Documentation

Half-Width of Confidence Interval for Normal Distribution Mean or Difference Between Two Means

Description

Compute the half-width of a confidence interval for the mean of a normal distribution or the difference between two means, given the sample size(s), estimated standard deviation, and confidence level.

Usage

  ciNormHalfWidth(n.or.n1, n2 = n.or.n1, 
    sigma.hat = 1, conf.level = 0.95, 
    sample.type = ifelse(missing(n2), "one.sample", "two.sample"))

Arguments

`n.or.n1`	numeric vector of sample sizes. When `sample.type="one.sample"`, this argument denotes `n`, the number of observations in the single sample. When `sample.type="two.sample"`, this argument denotes `n_1`, the number of observations from group 1. Missing (`NA`), undefined (`NaN`), and infinite (`Inf`, `-Inf`) values are not allowed.
`n2`	numeric vector of sample sizes for group 2. The default value is the value of `n.or.n1`. This argument is ignored when `sample.type="one.sample"`. Missing (`NA`), undefined (`NaN`), and infinite (`Inf`, `-Inf`) values are not allowed.
`sigma.hat`	numeric vector specifying the value(s) of the estimated standard deviation(s).
`conf.level`	numeric vector of numbers between 0 and 1 indicating the confidence level associated with the confidence interval(s). The default value is `conf.level=0.95`.
`sample.type`	character string indicating whether this is a one-sample (`sample.type="one.sample"`) or two-sample (`sample.type="two.sample"`) confidence interval. When `sample.type="one.sample"`, the computed half-width is based on a confidence interval for a single mean. When `sample.type="two.sample"`, the computed half-width is based on a confidence interval for the difference between two means. The default value is `sample.type="one.sample"` unless the argument `n2` is supplied.

Details

If the arguments n.or.n1, n2, sigma.hat, and conf.level are not all the same length, they are replicated to be the same length as the length of the longest argument.

One-Sample Case (sample.type="one.sample")
Let \underline{x} = x_1, x_2, \ldots, x_n denote a vector of n observations from a normal distribution with mean \mu and standard deviation \sigma. A two-sided (1-\alpha)100\% confidence interval for \mu is given by:

[\hat{\mu} - t(n-1, 1-\alpha/2) \frac{\hat{\sigma}}{\sqrt{n}}, \, \hat{\mu} + t(n-1, 1-\alpha/2) \frac{\hat{\sigma}}{\sqrt{n}}] \;\;\;\;\;\; (1)

where

\hat{\mu} = \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \;\;\;\;\;\; (2)

\hat{\sigma}^2 = s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2 \;\;\;\;\;\; (3)

and t(\nu, p) is the p'th quantile of Student's t-distribution with \nu degrees of freedom (Zar, 2010; Gilbert, 1987; Ott, 1995; Helsel and Hirsch, 1992). Thus, the half-width of this confidence interval is given by:

HW = t(n-1, 1-\alpha/2) \frac{\hat{\sigma}}{\sqrt{n}} \;\;\;\;\;\; (4)

Two-Sample Case (sample.type="two.sample")
Let \underline{x}_1 = x_{11}, x_{12}, \ldots, x_{1n_1} denote a vector of n_1 observations from a normal distribution with mean \mu_1 and standard deviation \sigma, and let \underline{x}_2 = x_{21}, x_{22}, \ldots, x_{2n_2} denote a vector of n_2 observations from a normal distribution with mean \mu_2 and standard deviation \sigma. A two-sided (1-\alpha)100\% confidence interval for \mu_1 - \mu_2 is given by:

[(\hat{\mu}_1 - \hat{\mu}_2) - t(n_1 + n_2 - 2, 1-\alpha/2) \hat{\sigma} \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}, \, (\hat{\mu}_1 - \hat{\mu}_2) + t(n_1 + n_2 - 2, 1-\alpha/2) \hat{\sigma} \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}] \;\;\;\;\;\; (5)

where

\hat{\mu}_1 = \bar{x}_1 = \frac{1}{n_1} \sum_{i=1}^{n_1} x_{1i} \;\;\;\;\;\; (6)

\hat{\mu}_2 = \bar{x}_2 = \frac{1}{n_2} \sum_{i=1}^{n_2} x_{2i} \;\;\;\;\;\; (7)

\hat{\sigma}^2 = s_p^2 = \frac{(n_1 - 1) s_1^2 + (n_2 - 1) s_2^2}{n_1 + n_2 - 2} \;\;\;\;\;\; (8)

s_1^2 = \frac{1}{n_1 - 1} \sum_{i=1}^{n_1} (x_{1i} - \bar{x}_1)^2 \;\;\;\;\;\; (9)

s_2^2 = \frac{1}{n_2 - 1} \sum_{i=1}^{n_2} (x_{2i} - \bar{x}_2)^2 \;\;\;\;\;\; (10)

(Zar, 2010, p.142; Helsel and Hirsch, 1992, p.135, Berthouex and Brown, 2002, pp.157–158). Thus, the half-width of this confidence interval is given by:

HW = t(n_1 + n_2 - 2, 1-\alpha/2) \hat{\sigma} \sqrt{\frac{1}{n_1} + \frac{1}{n_2}} \;\;\;\;\;\; (11)

Note that for the two-sample case, the function ciNormHalfWidth assumes the two populations have the same standard deviation.

Value

a numeric vector of half-widths.

Note

The normal distribution and lognormal distribution are probably the two most frequently used distributions to model environmental data. In order to make any kind of probability statement about a normally-distributed population (of chemical concentrations for example), you have to first estimate the mean and standard deviation (the population parameters) of the distribution. Once you estimate these parameters, it is often useful to characterize the uncertainty in the estimate of the mean. This is done with confidence intervals.

In the course of designing a sampling program, an environmental scientist may wish to determine the relationship between sample size, confidence level, and half-width if one of the objectives of the sampling program is to produce confidence intervals. The functions ciNormHalfWidth, ciNormN, and plotCiNormDesign can be used to investigate these relationships for the case of normally-distributed observations.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Second Edition. Lewis Publishers, Boca Raton, FL.

Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York, NY.

Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY, Chapter 7.

Millard, S.P., and N. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL.

Ott, W.R. (1995). Environmental Statistics and Data Analysis. Lewis Publishers, Boca Raton, FL.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.21-3.

Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ, Chapters 7 and 8.

Examples

  # Look at how the half-width of a one-sample confidence interval 
  # decreases with increasing sample size:

  seq(5, 30, by = 5) 
  #[1] 5 10 15 20 25 30 

  hw <- ciNormHalfWidth(n.or.n1 = seq(5, 30, by = 5)) 

  round(hw, 2) 
  #[1] 1.24 0.72 0.55 0.47 0.41 0.37

  #----------------------------------------------------------------

  # Look at how the half-width of a one-sample confidence interval 
  # increases with increasing estimated standard deviation:

  seq(0.5, 2, by = 0.5) 
  #[1] 0.5 1.0 1.5 2.0 

  hw <- ciNormHalfWidth(n.or.n1 = 20, sigma.hat = seq(0.5, 2, by = 0.5)) 

  round(hw, 2) 
  #[1] 0.23 0.47 0.70 0.94

  #----------------------------------------------------------------

  # Look at how the half-width of a one-sample confidence interval 
  # increases with increasing confidence level:

  seq(0.5, 0.9, by = 0.1) 
  #[1] 0.5 0.6 0.7 0.8 0.9 

  hw <- ciNormHalfWidth(n.or.n1 = 20, conf.level = seq(0.5, 0.9, by = 0.1)) 

  round(hw, 2) 
  #[1] 0.15 0.19 0.24 0.30 0.39

  #==========

  # Modifying the example on pages 21-4 to 21-5 of USEPA (2009), 
  # determine how adding another four months of observations to 
  # increase the sample size from 4 to 8 will affect the half-width 
  # of a two-sided 95% confidence interval for the Aldicarb level at 
  # the first compliance well.
  #  
  # Use the estimated standard deviation from the first four months 
  # of data.  (The data are stored in EPA.09.Ex.21.1.aldicarb.df.) 
  # Note that the half-width changes from 34% of the observed mean to 
  # 18% of the observed mean by increasing the sample size from 
  # 4 to 8.

  EPA.09.Ex.21.1.aldicarb.df
  #   Month   Well Aldicarb.ppb
  #1      1 Well.1         19.9
  #2      2 Well.1         29.6
  #3      3 Well.1         18.7
  #4      4 Well.1         24.2
  #...

  mu.hat <- with(EPA.09.Ex.21.1.aldicarb.df, 
    mean(Aldicarb.ppb[Well=="Well.1"]))

  mu.hat 
  #[1] 23.1 

  sigma.hat <- with(EPA.09.Ex.21.1.aldicarb.df, 
    sd(Aldicarb.ppb[Well=="Well.1"]))

  sigma.hat 
  #[1] 4.93491 

  hw.4 <- ciNormHalfWidth(n.or.n1 = 4, sigma.hat = sigma.hat) 

  hw.4 
  #[1] 7.852543 

  hw.8 <- ciNormHalfWidth(n.or.n1 = 8, sigma.hat = sigma.hat) 

  hw.8 
  #[1] 4.125688 

  100 * hw.4/mu.hat 
  #[1] 33.99369 

  100 * hw.8/mu.hat 
  #[1] 17.86012

  #==========

  # Clean up
  #---------
  rm(hw, mu.hat, sigma.hat, hw.4, hw.8)

[Package EnvStats version 2.8.1 Index]