kendallTrendTest {EnvStats}  R Documentation 
Perform a nonparametric test for a monotonic trend based on Kendall's tau statistic, and optionally compute a confidence interval for the slope.
kendallTrendTest(y, ...)
## S3 method for class 'formula'
kendallTrendTest(y, data = NULL, subset,
na.action = na.pass, ...)
## Default S3 method:
kendallTrendTest(y, x = seq(along = y),
alternative = "two.sided", correct = TRUE, ci.slope = TRUE,
conf.level = 0.95, warn = TRUE, data.name = NULL, data.name.x = NULL,
parent.of.data = NULL, subset.expression = NULL, ...)
y 
an object containing data for the trend test. In the default method,
the argument 
data 
specifies an optional data frame, list or environment (or object coercible by

subset 
specifies an optional vector specifying a subset of observations to be used. 
na.action 
specifies a function which indicates what should happen when the data contain 
x 
numeric vector of "predictor" values. The length of 
alternative 
character string indicating the kind of alternative hypothesis. The
possible values are 
correct 
logical scalar indicating whether to use the correction for continuity in
computing the 
ci.slope 
logical scalar indicating whether to compute a confidence interval for the
slope. The default value is 
conf.level 
numeric scalar between 0 and 1 indicating the confidence level associated
with the confidence interval for the slope. The default value is

warn 
logical scalar indicating whether to print a warning message when

data.name 
character string indicating the name of the data used for the trend test.
The default value is 
data.name.x 
character string indicating the name of the data used for the predictor variable x.
If 
parent.of.data 
character string indicating the source of the data used for the trend test. 
subset.expression 
character string indicating the expression used to subset the data. 
... 
additional arguments affecting the test for trend. 
kendallTrendTest
performs Kendall's nonparametric test for a monotonic trend,
which is a special case of the test for independence based on Kendall's tau statistic
(see cor.test
). The slope is estimated using the method of Theil (1950) and
Sen (1968). When ci.slope=TRUE
, the confidence interval for the slope is
computed using Gilbert's (1987) Modification of the Theil/Sen Method.
Kendall's test for a monotonic trend is a special case of the test for independence
based on Kendall's tau statistic. The first section below explains the general case
of testing for independence. The second section explains the special case of
testing for monotonic trend. The last section explains how a simple linear
regression model is a special case of a monotonic trend and how the slope may be
estimated.
The General Case of Testing for Independence
Definition of Kendall's Tau
Let X
and Y
denote two continuous random variables with some joint
(bivariate) distribution. Let (X_1, Y_1), (X_2, Y_2), \ldots, (X_n, Y_n)
denote a set of n
bivariate observations from this distribution, and assume
these bivariate observations are mutually independent. Kendall (1938, 1975) proposed
a test for the hypothesis that the X
and Y
random variables are
independent based on estimating the following quantity:
\tau = \{ 2 Pr[(X_2  X_1)(Y_2  Y_1) > 0] \}  1 \;\;\;\;\;\; (1)
The quantity in Equation (1) is called Kendall's tau, although this term is more
often applied to the estimate of \tau
(see Equation (2) below).
If X
and Y
are independent, then \tau=0
. Furthermore, for most
distributions of interest, if \tau=0
then the random variables X
and
Y
are independent. (It can be shown that there exist some distributions for
which \tau=0
and the random variables X
and Y
are not independent;
see Hollander and Wolfe (1999, p.364)).
Note that Kendall's tau is similar to a correlation coefficient in that
1 \le \tau \le 1
. If X
and Y
always vary in the same direction,
that is if X_1 < X_2
always implies Y_1 < Y_2
, then \tau = 1
.
If X
and Y
always vary in the opposite direction, that is if
X_1 < X_2
always implies Y_1 > Y_2
, then \tau = 1
. If
\tau > 0
, this indicates X
and Y
are positively associated.
If \tau < 0
, this indicates X
and Y
are negatively associated.
Estimating Kendall's Tau
The quantity in Equation (1) can be estimated by:
\hat{\tau} = \frac{2S}{n(n1)} \;\;\;\;\;\; (2)
where
S = \sum_{i=1}^{n1} \sum_{j=i+1}^{n} sign[(X_j  X_i)(Y_j  Y_i)] \;\;\;\;\;\; (3)
and sign()
denotes the sign
function:
1  x < 0 

sign(x) =  0  x = 0 \;\;\;\;\;\; (4) 
1  x > 0

(Hollander and Wolfe, 1999, Chapter 8; Conover, 1980, pp.256–260; Gilbert, 1987, Chapter 16; Helsel and Hirsch, 1992, pp.212–216; Gibbons et al., 2009, Chapter 11). The quantity defined in Equation (2) is called Kendall's rank correlation coefficient or more often Kendall's tau.
Note that the quantity S
defined in Equation (3) is equal to the number of
concordant pairs minus the number of discordant pairs. Hollander and Wolfe
(1999, p.364) use the notation K
instead of S
, and Conover (1980, p.257)
uses the notation T
.
Testing the Null Hypothesis of Independence
The null hypothesis H_0: \tau = 0
, can be tested using the statistic S
defined in Equation (3) above. Tables of the distribution of S
for small
samples are given in Hollander and Wolfe (1999), Conover (1980, pp.458–459),
Gilbert (1987, p.272), Helsel and Hirsch (1992, p.469), and Gibbons (2009, p.210).
The function kendallTrendTest
uses the large sample approximation to the
distribution of S
under the null hypothesis, which is given by:
z = \frac{S  E(S)}{\sqrt{Var(S)}} \;\;\;\;\;\; (5)
where
E(S) = 0 \;\;\;\;\;\; (6)
Var(S) = \frac{n(n1)(2n+5)}{18} \;\;\;\;\;\; (7)
Under the null hypothesis, the quantity z
defined in Equation (5) is
approximately distributed as a standard normal random variable.
Both Kendall (1975) and Mann (1945) show that the normal approximation is excellent
even for samples as small as n=10
, provided that the following continuity
correction is used:
z = \frac{S  sign(S)}{\sqrt{Var(S)}} \;\;\;\;\;\; (8)
The function kendallTrendTest
performs the usual onesample ztest using
the statistic computed in Equation (8) or Equation (5). The argument
correct
determines which equation is used to compute the zstatistic.
By default, correct=TRUE
so Equation (8) is used.
In the case of tied observations in either the observed X
's and/or observed
Y
's, the formula for the variance of S
given in Equation (7) must be
modified as follows:
Var(S) =  \frac{n(n1)(2n+5)}{18}  
\frac{\sum_{i=1}^{g} t_i(t_i1)(2t_i+5)}{18}  

\frac{\sum_{j=1}^{h} u_j(u_j1)(2u_j+5)}{18} + 

\frac{[\sum_{i=1}^{g} t_i(t_i1)(t_i2)][\sum_{j=1}^{h} u_j(u_j1)(u_j2)]}{9n(n1)(n2)} + 

\frac{[\sum_{i=1}^{g} t_i(t_i1)][\sum_{j=1}^{h} u_j(u_j1)]}{2n(n1)} \;\;\;\;\;\; (9)

where g
is the number of tied groups in the X
observations,
t_i
is the size of the i
'th tied group in the X
observations,
h
is the number of tied groups in the Y
observations, and
u_j
is the size of the j
'th tied group in the Y
observations.
In the case of no ties in either the X
or Y
observations, Equation (9)
reduces to Equation (7).
The Special Case of Testing for Monotonic Trend
Often in environmental sampling, observations are taken periodically over time
(Hirsch et al., 1982; van Belle and Hughes, 1984; Hirsch and Slack, 1984). In
this case, the random variables Y_1, Y_2, \ldots, Y_n
can be thought of as
representing the observations, and the variables X_1, X_2, \ldots, X_n
are no longer random but represent the time at which the i
'th observation
was taken. If the observations are equally spaced over time, then it is useful to
make the simplification X_i = i
for i = 1, 2, \ldots, n
. This is in
fact the default value of the argument x
for the function
kendallTrendTest
.
In the case where the X
's represent time and are all distinct, the test for
independence between X
and Y
is equivalent to testing for a monotonic
trend in Y
, and the test statistic S
simplifies to:
S = \sum_{i=1}^{n1} \sum_{j=i+1}^{n} sign(Y_j  Y_i) \;\;\;\;\;\; (10)
Also, the formula for the variance of S
in the presence of ties (under the
null hypothesis H_0: \tau = 0
) simplifies to:
Var(S) = \frac{n(n1)(2n+5)}{18}  \frac{\sum_{j=1}^{h} u_j(u_j1)(2u_j+5)}{18} \;\;\;\;\;\; (11)
A form of the test statistic S
in Equation (10) was introduced by Mann (1945).
The Special Case of a Simple Linear Model: Estimating the Slope
Consider the simple linear regression model
Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \;\;\;\;\;\; (12)
where \beta_0
denotes the intercept, \beta_1
denotes the slope,
i = 1, 2, \ldots, n
, and the \epsilon
's are assumed to be
independent and identically distributed random variables from the same distribution.
This is a special case of dependence between the X
's and the Y
's, and
the null hypothesis of a zero slope can be tested using Kendall's test statistic
S
(Equation (3) or (10) above) and the associated zstatistic
(Equation (5) or (8) above) (Hollander and Wolfe, 1999, pp.415–420).
Theil (1950) proposed the following nonparametric estimator of the slope:
\hat{\beta}_1 = Median(\frac{Y_j  Y_i}{X_j  X_i}); \;\; i < j \;\;\;\;\;\; (13)
Note that the computation of the estimated slope involves computing
N = {n \choose 2} = \frac{n(n1)}{2} \;\;\;\;\;\; (14)
“twopoint” estimated slopes (assuming no tied X
values), and taking
the median of these N values.
Sen (1968) generalized this estimator to the case where there are possibly tied
observations in the X
's. In this case, Sen simply ignores the twopoint
estimated slopes where the X
's are tied and computes the median based on the
remaining N'
twopoint estimated slopes. That is, Sen's estimator is given by:
\hat{\beta}_1 = Median(\frac{Y_j  Y_i}{X_j  X_i}); \;\; i < j, X_i \ne X_j \;\;\;\;\;\; (15)
(Hollander and Wolfe, 1999, pp.421–422).
Conover (1980, p. 267) suggests the following estimator for the intercept:
\hat{\beta}_0 = Y_{0.5}  \hat{\beta}_1 X_{0.5} \;\;\;\;\;\; (16)
where X_{0.5}
and Y_{0.5}
denote the sample medians of the X
's
and Y
's, respectively. With these estimators of slope and intercept, the
estimated regression line passes through the point (X_{0.5}, Y_{0.5})
.
NOTE: The function kendallTrendTest
always returns estimates of
slope and intercept assuming a linear model (Equation (12)), while the pvalue
is based on Kendall's tau, which is testing for the broader alternative of any
kind of dependence between the X
's and Y
's.
Confidence Interval for the Slope
Theil (1950) and Sen (1968) proposed methods to compute a confidence interval for
the true slope, assuming the linear model of Equation (12) (see
Hollander and Wolfe, 1999, pp.421422). Gilbert (1987, p.218) illustrates a
simpler method than the one given by Sen (1968) that is based on a normal
approximation. Gilbert's (1987) method is an extension of the one given in
Hollander and Wolfe (1999, p.424) that allows for ties and/or multiple
observations per time period. This method is valid for a sample size as small as
n=10
unless there are several tied observations.
Let N'
denote the number of defined twopoint estimated slopes that are used
in Equation (15) above (if there are no tied X
values then N' = N
), and
let \hat{\beta}_{1_{(1)}}, \hat{\beta}_{1_{(2)}}, \ldots, \hat{\beta}_{1_{(N')}}
denote the N'
ordered slopes. For Gilbert's (1987) method, a
100(1\alpha)\%
twosided confidence interval for the true slope is given by:
[\hat{\beta}_{1_{(M1)}}, \hat{\beta}_{1_{(M2+1)}}] \;\;\;\;\;\; (17)
where
M1 = \frac{N'  C_{\alpha}}{2} \;\;\;\;\;\; (18)
M2 = \frac{N' + C_{\alpha}}{2} \;\;\;\;\;\; (19)
C_\alpha = z_{1  \alpha/2} \sqrt{Var(S)} \;\;\;\;\;\; (20)
Var(S)
is defined in Equations (7), (9), or (11), and
z_p
denotes the p
'th quantile of the standard normal distribution.
Onesided confidence intervals may computed in a similar fashion.
Usually the quantities M1
and M2
will not be integers.
Gilbert (1987, p.219) suggests interpolating between adjacent values in this case,
which is what the function kendallTrendTest
does.
A list of class "htest"
containing the results of the hypothesis
test. See the help file for htest.object
for details.
In addition, the following components are part of the list returned by
kendallTrendTest
:
S 
The value of the Kendall Sstatistic. 
var.S 
The variance of the Kendall Sstatistic. 
slopes 
A numeric vector of all possible twopoint slope estimates.
This component is used by the function 
Kendall's test for independence or trend is a nonparametric test. No
assumptions are made about the distribution of the X
and Y
variables. Hirsch et al. (1982) introduced the "seasonal Kendall test" to
test for trend within each season. They note that Kendall's test for trend
is easy to compute, even in the presence of missing values, and can also be
used with censored values.
van Belle and Hughes (1984) note that Kendall's test for trend is slightly less powerful than the test based on Spearman's rho, but it converges to normality faster. Also, Bradley (1968, p.288) shows that for the case of a linear model with normal (Gaussian) errors, the asymptotic relative efficiency of Kendall's test for trend versus the parametric test for a zero slope is 0.98.
The results of the function kendallTrendTest
are similar to the
results of the builtin R function cor.test
with the
argument method="kendall"
except that cor.test
1) computes exact pvalues when the number of pairs is less than 50 and
there are no ties, and 2) does not return a confidence interval for
the slope.
Steven P. Millard (EnvStats@ProbStatInfo.com)
Bradley, J.V. (1968). DistributionFree Statistical Tests. PrenticeHall, Englewood Cliffs, NJ.
Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York, pp.256272.
Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.
Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York, NY, Chapter 16.
Helsel, D.R. and R.M. Hirsch. (1988). Discussion of Applicability of the ttest for Detecting Trends in Water Quality Variables. Water Resources Bulletin 24(1), 201204.
Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, NY.
Helsel, D.R., and R. M. Hirsch. (2002). Statistical Methods in Water Resources. Techniques of Water Resources Investigations, Book 4, chapter A3. U.S. Geological Survey. Available online at https://pubs.usgs.gov/tm/04/a03/tm4a3.pdf.
Hirsch, R.M., J.R. Slack, and R.A. Smith. (1982). Techniques of Trend Analysis for Monthly Water Quality Data. Water Resources Research 18(1), 107121.
Hirsch, R.M. and J.R. Slack. (1984). A Nonparametric Trend Test for Seasonal Data with Serial Dependence. Water Resources Research 20(6), 727732.
Hirsch, R.M., R.B. Alexander, and R.A. Smith. (1991). Selection of Methods for the Detection and Estimation of Trends in Water Quality. Water Resources Research 27(5), 803813.
Hollander, M., and D.A. Wolfe. (1999). Nonparametric Statistical Methods, Second Edition. John Wiley and Sons, New York.
Kendall, M.G. (1938). A New Measure of Rank Correlation. Biometrika 30, 8193.
Kendall, M.G. (1975). Rank Correlation Methods. Charles Griffin, London.
Mann, H.B. (1945). Nonparametric Tests Against Trend. Econometrica 13, 245259.
Millard, S.P., and Neerchal, N.K. (2001). Environmental Statistics with SPLUS. CRC Press, Boca Raton, Florida.
Sen, P.K. (1968). Estimates of the Regression Coefficient Based on Kendall's Tau. Journal of the American Statistical Association 63, 13791389.
Theil, H. (1950). A RankInvariant Method of Linear and Polynomial Regression Analysis, IIII. Proc. Kon. Ned. Akad. v. Wetensch. A. 53, 386392, 521525, 13971412.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R09007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.
USEPA. (2010). Errata Sheet  March 2009 Unified Guidance. EPA 530/R09007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.
van Belle, G., and J.P. Hughes. (1984). Nonparametric Tests for Trend in Water Quality. Water Resources Research 20(1), 127136.
cor.test
, kendallSeasonalTrendTest
, htest.object
.
# Reproduce Example 176 on page 1733 of USEPA (2009). This example
# tests for trend in sulfate concentrations (ppm) collected at various
# months between 1989 and 1996.
head(EPA.09.Ex.17.6.sulfate.df)
# Sample.No Year Month Sampling.Date Date Sulfate.ppm
#1 1 89 6 89.6 19890601 480
#2 2 89 8 89.8 19890801 450
#3 3 90 1 90.1 19900101 490
#4 4 90 3 90.3 19900301 520
#5 5 90 6 90.6 19900601 485
#6 6 90 8 90.8 19900801 510
# Plot the data
#
dev.new()
with(EPA.09.Ex.17.6.sulfate.df,
plot(Sampling.Date, Sulfate.ppm, pch = 15, ylim = c(400, 900),
xlab = "Sampling Date", ylab = "Sulfate Conc (ppm)",
main = "Figure 176. Time Series Plot of \nSulfate Concentrations (ppm)")
)
Sulfate.fit < lm(Sulfate.ppm ~ Sampling.Date,
data = EPA.09.Ex.17.6.sulfate.df)
abline(Sulfate.fit, lty = 2)
# Perform the Kendall test for trend
#
kendallTrendTest(Sulfate.ppm ~ Sampling.Date,
data = EPA.09.Ex.17.6.sulfate.df)
#Results of Hypothesis Test
#
#
#Null Hypothesis: tau = 0
#
#Alternative Hypothesis: True tau is not equal to 0
#
#Test Name: Kendall's Test for Trend
# (with continuity correction)
#
#Estimated Parameter(s): tau = 0.7667984
# slope = 26.6666667
# intercept = 1909.3333333
#
#Estimation Method: slope: Theil/Sen Estimator
# intercept: Conover's Estimator
#
#Data: y = Sulfate.ppm
# x = Sampling.Date
#
#Data Source: EPA.09.Ex.17.6.sulfate.df
#
#Sample Size: 23
#
#Test Statistic: z = 5.107322
#
#Pvalue: 3.267574e07
#
#Confidence Interval for: slope
#
#Confidence Interval Method: Gilbert's Modification
# of Theil/Sen Method
#
#Confidence Interval Type: twosided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 20.00000
# UCL = 35.71182
# Clean up
#
rm(Sulfate.fit)
graphics.off()