R: One-Sample or Paired-Sample Sign Test on a Median

signTest {EnvStats}

R Documentation

One-Sample or Paired-Sample Sign Test on a Median

Description

Estimate the median, test the null hypothesis that the median is equal to a user-specified value based on the sign test, and create a confidence interval for the median.

Usage

  signTest(x, y = NULL, alternative = "two.sided", mu = 0, paired = FALSE, 
    conf.level = 0.95)

Arguments

`x`	numeric vector of observations. Missing (`NA`), undefined (`NaN`), and infinite (`Inf`, `-Inf`) values are allowed but will be removed.
`y`	optional numeric vector of observations that are paired with the observations in `x`. The length of `y` must be the same as the length of `x`. This argument is ignored if `paired=FALSE`, and must be supplied if `paired=TRUE`. The default value is `y=NULL`. Missing (`NA`), undefined (`NaN`), and infinite (`Inf`, `-Inf`) values are allowed but will be removed.
`alternative`	character string indicating the kind of alternative hypothesis. The possible values are `"two.sided"` (the default), `"greater"`, and `"less"`.
`mu`	numeric scalar indicating the hypothesized value of the median. The default value is `mu=0`.
`paired`	logical scalar indicating whether to perform a paired or one-sample sign test. The possible values are `paired=FALSE` (the default; indicates a one-sample sign test) and `paired=TRUE`.
`conf.level`	numeric scalar between 0 and 1 indicating the confidence level associated with the confidence interval for the population median. The default value is `conf.level=0.95`.

Details

One-Sample Case (paired=FALSE)
Let \underline{x} = x_1, x_2, \ldots, x_n be a vector of n independent observations from one or more distributions that all have the same median \mu.

Consider the test of the null hypothesis:

H_0: \mu = \mu_0 \;\;\;\;\;\; (1)

The three possible alternative hypotheses are the upper one-sided alternative (alternative="greater")

H_a: \mu > \mu_0 \;\;\;\;\;\; (2)

the lower one-sided alternative (alternative="less")

H_a: \mu < \mu_0 \;\;\;\;\;\; (3)

and the two-sided alternative (alternative="two.sided")

H_a: \mu \ne \mu_0 \;\;\;\;\;\; (4)

To perform the test of the null hypothesis (1) versus any of the three alternatives (2)-(4), the sign test uses the test statistic T which is simply the number of observations that are greater than \mu_0 (Conover, 1980, p. 122; van Belle et al., 2004, p. 256; Hollander and Wolfe, 1999, p. 60; Lehmann, 1975, p. 120; Sheskin, 2011; Zar, 2010, p. 537). Under the null hypothesis, the distribution of T is a binomial random variable with parameters size=n and prob=0.5. Usually, however, cases for which the observations are equal to \mu_0 are discarded, so the distribution of T is taken to be binomial with parameters size=r and prob=0.5, where r denotes the number of observations not equal to \mu_0. The sign test only requires that the observations are independent and that they all come from one or more distributions (not necessarily the same ones) that all have the same population median.

For a two-sided alternative hypothesis (Equation (4)), the p-value is computed as:

p = Pr(X_{r,0.5} \le r-m) + Pr(X_{r,0.5} > m) \;\;\;\;\;\; (5)

where X_{r,p} denotes a binomial random variable with parameters size=r and prob=p, and m is defined by:

m = max(T, r-T) \;\;\;\;\;\; (6)

For a one-sided lower alternative hypothesis (Equation (3)), the p-value is computed as:

p = Pr(X_{m,0.5} \le T) \;\;\;\;\;\; (7)

and for a one-sided upper alternative hypothesis (Equation (2)), the p-value is computed as:

p = Pr(X_{m,0.5} \ge T) \;\;\;\;\;\; (8)

It is obvious that the sign test is simply a special case of the binomial test with p=0.5.

Computing Confidence Intervals
Based on the relationship between hypothesis tests and confidence intervals, we can construct a confidence interval for the population median based on the sign test (e.g., Hollander and Wolfe, 1999, p. 72; Lehmann, 1975, p. 182). It turns out that this is equivalent to using the formulas for a nonparametric confidence intervals for the 0.5 quantile (see eqnpar).

Paired-Sample Case (paired=TRUE)
When the argument paired=TRUE, the arguments x and y are assumed to have the same length, and the n differences d_i = x_i - y_i, \;\; i = 1, 2, \ldots, n are assumed to be independent observations from distributions with the same median \mu. The sign test can then be applied to the differences.

Value

A list of class "htest" containing the results of the hypothesis test. See the help file for htest.object for details.

Note

A frequent question in environmental statistics is “Is the concentration of chemical X greater than Y units?”. For example, in groundwater assessment (compliance) monitoring at hazardous and solid waste sites, the concentration of a chemical in the groundwater at a downgradient well must be compared to a groundwater protection standard (GWPS). If the concentration is “above” the GWPS, then the site enters corrective action monitoring. As another example, soil screening at a Superfund site involves comparing the concentration of a chemical in the soil with a pre-determined soil screening level (SSL). If the concentration is “above” the SSL, then further investigation and possible remedial action is required. Determining what it means for the chemical concentration to be “above” a GWPS or an SSL is a policy decision: the average of the distribution of the chemical concentration must be above the GWPS or SSL, or the median must be above the GWPS or SSL, or the 95th percentile must be above the GWPS or SSL, or something else. Often, the first interpretation is used.

Hypothesis tests you can use to perform tests of location include: Student's t-test, Fisher's randomization test, the Wilcoxon signed rank test, Chen's modified t-test, the sign test, and a test based on a bootstrap confidence interval. For a discussion comparing the performance of these tests, see Millard and Neerchal (2001, pp.408-409).

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York, p.122

Hollander, M., and D.A. Wolfe. (1999). Nonparametric Statistical Methods. Second Edition. John Wiley and Sons, New York, p.60.

Lehmann, E.L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, Oakland, CA, p.120.

Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL, pp.404–406.

Sheskin, D.J. (2011). Handbook of Parametric and Nonparametric Statistical Procedures Fifth Edition. CRC Press, Boca Raton, FL.

van Belle, G., L.D. Fisher, Heagerty, P.J., and Lumley, T. (2004). Biostatistics: A Methodology for the Health Sciences 2nd Edition. John Wiley & Sons, New York.

Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ,

Examples

  # Generate 10 observations from a lognormal distribution with parameters 
  # meanlog=2 and sdlog=1.  The median of this distribution is e^2 (about 7.4). 
  # Test the null hypothesis that the true median is equal to 5 against the 
  # alternative that the true mean is greater than 5. 
  # (Note: the call to set.seed allows you to reproduce this example).

  set.seed(23) 
  dat <- rlnorm(10, meanlog = 2, sdlog = 1) 
  signTest(dat, mu = 5) 

  #Results of Hypothesis Test
  #--------------------------
  #
  #Null Hypothesis:                 median = 5
  #
  #Alternative Hypothesis:          True median is not equal to 5
  #
  #Test Name:                       Sign test
  #
  #Estimated Parameter(s):          median = 19.21717
  #
  #Data:                            dat
  #
  #Test Statistic:                  # Obs > median = 9
  #
  #P-value:                         0.02148438
  #
  #Confidence Interval for:         median
  #
  #Confidence Interval Method:      exact
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                93.45703%
  #
  #Confidence Limit Rank(s):        3 9 
  #
  #Confidence Interval:             LCL =  7.732538
  #                                 UCL = 35.722459

  # Clean up
  #---------
  rm(dat)

  #==========

  # The guidance document "Supplemental Guidance to RAGS: Calculating the 
  # Concentration Term" (USEPA, 1992d) contains an example of 15 observations 
  # of chromium concentrations (mg/kg) which are assumed to come from a 
  # lognormal distribution.  These data are stored in the vector 
  # EPA.92d.chromium.vec.  Here, we will use the sign test to test the null 
  # hypothesis that the median chromium concentration is less than or equal to 
  # 100 mg/kg vs. the alternative that it is greater than 100 mg/kg.  The 
  # estimated median is 110 mg/kg.  There are 8 out of 15 observations greater 
  # than 100 mg/kg, the p-value is equal to 0.5, and the lower 94% confidence 
  # limit is 41 mg/kg.

  signTest(EPA.92d.chromium.vec, mu = 100, alternative = "greater") 

  #Results of Hypothesis Test
  #--------------------------
  #
  #Null Hypothesis:                 median = 100
  #
  #Alternative Hypothesis:          True median is greater than 100
  #
  #Test Name:                       Sign test
  #
  #Estimated Parameter(s):          median = 110
  #
  #Data:                            EPA.92d.chromium.vec
  #
  #Test Statistic:                  # Obs > median = 8
  #
  #P-value:                         0.5
  #
  #Confidence Interval for:         median
  #
  #Confidence Interval Method:      exact
  #
  #Confidence Interval Type:        lower
  #
  #Confidence Level:                94.07654%
  #
  #Confidence Limit Rank(s):        5 
  #
  #Confidence Interval:             LCL =  41
  #                                 UCL = Inf

[Package EnvStats version 2.8.1 Index]