linearTrendTestPower {EnvStats}  R Documentation 
Compute the power of a parametric test for linear trend, given the sample size or predictor variable values, scaled slope, and significance level.
linearTrendTestPower(n, x = lapply(n, seq), slope.over.sigma = 0, alpha = 0.05,
alternative = "two.sided", approx = FALSE)
n 
numeric vector of sample sizes. All values of 
x 
numeric vector of predictor variable values, or a list in which each component is
a numeric vector of predictor variable values. Usually, the predictor variable is
time (e.g., days, months, quarters, etc.). The default value is

slope.over.sigma 
numeric vector specifying the ratio of the true slope to the standard deviation of
the error terms ( 
alpha 
numeric vector of numbers between 0 and 1 indicating the Type I error level
associated with the hypothesis test. The default value is 
alternative 
character string indicating the kind of alternative hypothesis. The possible values
are 
approx 
logical scalar indicating whether to compute the power based on an approximation to
the noncentral tdistribution. The default value is 
If the argument x
is a vector, it is converted into a list with one
component. If the arguments n
, x
, slope.over.sigma
, and
alpha
are not all the same length, they are replicated to be the same
length as the length of the longest argument.
Basic Model
Consider the simple linear regression model
Y = \beta_0 + \beta_1 X + \epsilon \;\;\;\;\;\; (1)
where X
denotes the predictor variable (observed without error),
\beta_0
denotes the intercept, \beta_1
denotes the slope, and the
error term \epsilon
is assumed to be a random variable from a normal
distribution with mean 0 and standard deviation \sigma
. Let
(\underline{x}, \underline{y}) = (x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n) \;\;\;\;\;\; (2)
denote n
independent observed (X,Y)
pairs from the model (1).
Often in environmental data analysis, we are interested in determining whether there
is a trend in some indicator variable over time. In this case, the predictor
variable X
is time (e.g., day, month, quarter, year, etc.), and the n
values of the response variable Y
represent measurements taken over time.
The slope then represents the change in the average of the response variable per
one unit of time.
When the argument x
is a numeric vector, it represents the
n
values of the predictor variable. When the argument x
is a
list, each component of x
is a numeric vector that represents a set values
of the predictor variable (and the number of elements may vary by component).
By default, the argument x
is a list for which the i'th component is simply
the integers from 1 to the value of the i'th element of the argument n
,
representing, for example, Day 1, Day2, ..., Day n[i]
.
In the discussion that follows, be sure not to confuse the intercept and slope
coefficients \beta_0
and \beta_1
with the Type II error of the
hypothesis test, which is denoted by \beta
.
Estimation of Coefficients and Confidence Interval for Slope
The standard leastsquares estimators of the slope and intercept are given by:
\hat{\beta}_1 = \frac{S_{xy}}{S_{xx}} \;\;\;\;\;\; (3)
\hat{\beta}_0 = \bar{y}  \hat{\beta}_1 \bar{x} \;\;\;\;\;\; (4)
where
S_{xy} = \sum_{i=1}^n (x_i  \bar{x})(y_i  \bar{y}) \;\;\;\;\;\; (5)
S_{xx} = \sum_{i=1}^n (x_i  \bar{x})^2 \;\;\;\;\;\; (6)
\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \;\;\;\;\;\; (7)
\bar{y} = \frac{1}{n} \sum_{i=1}^n y_i \;\;\;\;\;\; (8)
(Draper and Smith, 1998, p.25; Zar, 2010, p.332334; Berthoux and Brown, 2002, p.297; Helsel and Hirsch, p.226). The estimator of slope in Equation (3) has a normal distribution with mean equal to the true slope, and variance given by:
Var(\hat{\beta}_1) = \sigma_{\hat{\beta}_1}^2 = \frac{\sigma^2}{S_{xx}} \;\;\;\;\;\; (9)
(Draper and Smith, 1998, p.35; Zar, 2010, p.341; Berthoux and Brown, 2002, p.299;
Helsel and Hirsch, 1992, p.227). Thus, a (1\alpha)100\%
twosided confidence
interval for the slope is given by:
[ \hat{\beta}_1  t_{n2}(1\alpha/2) \hat{\sigma}_{\hat{\beta}_1}, \;\; \hat{\beta}_1 + t_{n2}(1\alpha/2) \hat{\sigma}_{\hat{\beta}_1} ] \;\;\;\;\;\; (10)
where
\hat{\sigma}_{\hat{\beta}_1} = \frac{\hat{\sigma}}{\sqrt{S_{xx}}} \;\;\;\;\;\; (11)
\hat{\sigma}^2 = s^2 = \frac{1}{n2} \sum_{i=1}^n (y_i  \hat{y}_i)^2 \;\;\;\;\;\; (12)
\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i \;\;\;\;\;\; (13)
and t_{\nu}(p)
denotes the p
'th quantile of
Student's tdistribution with \nu
degrees of freedom
(Draper and Smith, 1998, p.36; Zar, 2010, p.343; Berthoux and Brown, 2002, p.300;
Helsel and Hirsch, 1992, p.240).
Testing for a NonZero Slope
Consider the null hypothesis of a zero slope coefficient:
H_0: \beta_1 = 0 \;\;\;\;\;\; (14)
The three possible alternative hypotheses are the upper onesided alternative
(alternative="greater"
):
H_a: \beta_1 > 0 \;\;\;\;\;\; (15)
the lower onesided alternative (alternative="less"
)
H_a: \beta_1 < 0 \;\;\;\;\;\; (16)
and the twosided alternative (alternative="two.sided"
)
H_a: \beta_1 \ne 0 \;\;\;\;\;\; (17)
The test of the null hypothesis (14) versus any of the three alternatives (15)(17) is based on the Student tstatistic:
t = \frac{\hat{\beta}_1}{\hat{\sigma}_{\hat{\beta}_1}} = \frac{\hat{\beta}_1}{s/\sqrt{S_{xx}}} \;\;\;\;\;\; (18)
Under the null hypothesis (14), the tstatistic in (18) follows a
Student's tdistribution with n2
degrees of freedom
(Draper and Smith, 1998, p.36; Zar, 2010, p.341;
Helsel and Hirsch, 1992, pp.238239).
The formula for the power of the test of a zero slope depends on which alternative
is being tested.
The two subsections below describe exact and approximate formulas for the power of
the test. Note that none of the equations for the power of the ttest
requires knowledge of the values \beta_1
or \sigma
(the population standard deviation of the error terms), only the ratio
\beta_1/\sigma
. The argument slope.over.sigma
is this ratio, and it is
referred to as the “scaled slope”.
Exact Power Calculations (approx=FALSE
)
This subsection describes the exact formulas for the power of the ttest for a
zero slope.
Upper onesided alternative (alternative="greater"
)
The standard Student's ttest rejects the null hypothesis (1) in favor of the
upper alternative hypothesis (2) at level\alpha
if
t \ge t_{\nu}(1  \alpha) \;\;\;\;\;\; (19)
where
\nu = n  2 \;\;\;\;\;\; (20)
and, as noted previously, t_{\nu}(p)
denotes the p
'th quantile of
Student's tdistribution with \nu
degrees of freedom.
The power of this test, denoted by 1\beta
, where \beta
denotes the
probability of a Type II error, is given by:
1  \beta = Pr[t_{\nu, \Delta} \ge t_{\nu}(1  \alpha)] = 1  G[t_{\nu}(1  \alpha), \nu, \Delta] \;\;\;\;\;\; (21)
where
\Delta = \sqrt{S_{xx}} \frac{\beta_1}{\sigma} \;\;\;\;\;\; (22)
and t_{\nu, \Delta}
denotes a
noncentral Student's trandom variable with
\nu
degrees of freedom and noncentrality parameter \Delta
, and
G(x, \nu, \Delta)
denotes the cumulative distribution function of this
random variable evaluated at x
(Johnson et al., 1995, pp.508510).
Note that when the predictor variable X
represents equallyspaced measures
of time (e.g., days, months, quarters, etc.) and
x_i = i, \;\; i = 1, 2, \ldots, n \;\;\;\;\;\; (23)
then the noncentrality parameter in Equation (22) becomes:
\Delta = \sqrt{\frac{(n1)n(n+1)}{12}} \frac{\beta_1}{\sigma} \;\;\;\;\;\; (24)
Lower onesided alternative (alternative="less"
)
The standard Student's ttest rejects the null hypothesis (1) in favor of the
lower alternative hypothesis (3) at level\alpha
if
t \le t_{\nu}(\alpha) \;\;\;\;\;\; (25)
and the power of this test is given by:
1  \beta = Pr[t_{\nu, \Delta} \le t_{\nu}(\alpha)] = G[t_{\nu}(\alpha), \nu, \Delta] \;\;\;\;\;\; (26)
Twosided alternative (alternative="two.sided"
)
The standard Student's ttest rejects the null hypothesis (14) in favor of the
twosided alternative hypothesis (17) at level\alpha
if
t \ge t_{\nu}(1  \alpha/2) \;\;\;\;\;\; (27)
and the power of this test is given by:
1  \beta = Pr[t_{\nu, \Delta} \le t_{\nu}(\alpha/2)] + Pr[t_{\nu, \Delta} \ge t_{\nu}(1  \alpha/2)]
= G[t_{\nu}(\alpha/2), \nu, \Delta] + 1  G[t_{\nu}(1  \alpha/2), \nu, \Delta] \;\;\;\;\;\; (28)
The power of the ttest given in Equation (28) can also be expressed in terms of the
cumulative distribution function of the noncentral Fdistribution
as follows. Let F_{\nu_1, \nu_2, \Delta}
denote a
noncentral F random variable with \nu_1
and
\nu_2
degrees of freedom and noncentrality parameter \Delta
, and let
H(x, \nu_1, \nu_2, \Delta)
denote the cumulative distribution function of this
random variable evaluated at x
. Also, let F_{\nu_1, \nu_2}(p)
denote
the p
'th quantile of the central Fdistribution with \nu_1
and
\nu_2
degrees of freedom. It can be shown that
(t_{\nu, \Delta})^2 \cong F_{1, \nu, \Delta^2} \;\;\;\;\;\; (29)
where \cong
denotes “equal in distribution”. Thus, it follows that
[t_{\nu}(1  \alpha/2)]^2 = F_{1, \nu}(1  \alpha) \;\;\;\;\;\; (30)
so the formula for the power of the ttest given in Equation (28) can also be written as:
1  \beta = Pr\{(t_{\nu, \Delta})^2 \ge [t_{\nu}(1  \alpha/2)]^2\}
= Pr[F_{1, \nu, \Delta^2} \ge F_{1, \nu}(1  \alpha)] = 1  H[F_{1, \nu}(1\alpha), 1, \nu, \Delta^2] \;\;\;\;\;\; (31)
Approximate Power Calculations (approx=TRUE
)
Zar (2010, pp.115–118) presents an approximation to the power for the ttest
given in Equations (21), (26), and (28) above. His approximation to the power
can be derived by using the approximation
\sqrt{S_{xx}} \frac{\beta_1}{s} \approx \sqrt{SS_{xx}} \frac{\beta_1}{\sigma} = \Delta \;\;\;\;\;\; (32)
where \approx
denotes “approximately equal to”. Zar's approximation
can be summarized in terms of the cumulative distribution function of the
noncentral tdistribution as follows:
G(x, \nu, \Delta) \approx G(x  \Delta, \nu, 0) = G(x  \Delta, \nu) \;\;\;\;\;\; (33)
where G(x, \nu)
denotes the cumulative distribution function of the
central Student's tdistribution with \nu
degrees of freedom evaluated at
x
.
The following three subsections explicitly derive the approximation to the power of
the ttest for each of the three alternative hypotheses.
Upper onesided alternative (alternative="greater"
)
The power for the upper onesided alternative (15) given in Equation (21) can be
approximated as:
1  \beta = Pr[t \ge t_{\nu}(1  \alpha)]
= Pr[\frac{\hat{\beta}_1}{s/\sqrt{S_{xx}}} \ge t_{\nu}(1  \alpha)  \sqrt{S_{xx}}\frac{\beta_1}{s}]
\approx Pr[t_{\nu} \ge t_{\nu}(1  \alpha)  \Delta]
= 1  Pr[t_{\nu} \le t_{\nu}(1  \alpha)  \Delta]
= 1  G[t_{\nu}(1\alpha)  \Delta, \nu] \;\;\;\;\;\; (34)
where t_{\nu}
denotes a central Student's trandom variable with \nu
degrees of freedom.
Lower onesided alternative (alternative="less"
)
The power for the lower onesided alternative (16) given in Equation (26) can be
approximated as:
1  \beta = Pr[t \le t_{\nu}(\alpha)]
= Pr[\frac{\hat{\beta}_1}{s/\sqrt{S_{xx}}} \le t_{\nu}(\alpha)  \sqrt{S_{xx}}\frac{\beta_1}{s}]
\approx Pr[t_{\nu} \le t_{\nu}(\alpha)  \Delta]
= G[t_{\nu}(\alpha)  \Delta, \nu] \;\;\;\;\;\; (35)
Twosided alternative (alternative="two.sided"
)
The power for the twosided alternative (17) given in Equation (28) can be
approximated as:
1  \beta = Pr[t \le t_{\nu}(\alpha/2)] + Pr[t \ge t_{\nu}(1  \alpha/2)]
= Pr[\frac{\hat{\beta}_1}{s/\sqrt{S_{xx}}} \le t_{\nu}(\alpha/2)  \sqrt{SS_{xx}}\frac{\beta_1}{s}] + Pr[\frac{\hat{\beta}_1}{s/\sqrt{S_{xx}}} \ge t_{\nu}(1  \alpha)  \sqrt{SS_{xx}}\frac{\beta_1}{s}]
\approx Pr[t_{\nu} \le t_{\nu}(\alpha/2)  \Delta] + Pr[t_{\nu} \ge t_{\nu}(1  \alpha/2)  \Delta]
= G[t_{\nu}(\alpha/2)  \Delta, \nu] + 1  G[t_{\nu}(1\alpha/2)  \Delta, \nu] \;\;\;\;\;\; (36)
a numeric vector powers.
Often in environmental data analysis, we are interested in determining whether
there is a trend in some indicator variable over time. In this case, the predictor
variable X
is time (e.g., day, month, quarter, year, etc.), and the n
values of the response variable represent measurements taken over time. The slope
then represents the change in the average of the response variable per one unit of
time.
You can use the parametric model (1) to model your data, then use the R function
lm
to fit the regression coefficients and the summary.lm
function to perform a test for the significance of the slope coefficient. The
function linearTrendTestPower
computes the power of this ttest, given a
fixed value of the sample size, scaled slope, and significance level.
You can also use Kendall's nonparametric test for trend
if you don't want to assume the error terms are normally distributed. When the
errors are truly normally distributed, the asymptotic relative efficiency of
Kendall's test for trend versus the parametric ttest for a zero slope is 0.98,
and Kendall's test can be more powerful than the parametric ttest when the errors
are not normally distributed. Thus the function linearTrendTestPower
can
also be used to estimate the power of Kendall's test for trend.
In the course of designing a sampling program, an environmental scientist may wish
to determine the relationship between sample size, significance level, power, and
scaled slope if one of the objectives of the sampling program is to determine
whether a trend is occurring. The functions linearTrendTestPower
,
linearTrendTestN
, linearTrendTestScaledMds
, and
plotLinearTrendTestDesign
can be used to investigate these
relationships.
Steven P. Millard (EnvStats@ProbStatInfo.com)
Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Second Edition. Lewis Publishers, Boca Raton, FL.
Draper, N., and H. Smith. (1998). Applied Regression Analysis. Third Edition. John Wiley and Sons, New York, Chapter 1.
Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY, Chapter 9.
Johnson, N. L., S. Kotz, and N. Balakrishnan. (1995). Continuous Univariate Distributions, Volume 2. Second Edition. John Wiley and Sons, New York, Chapters 28, 31
Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with SPLUS. CRC Press, Boca Raton, FL.
Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. PrenticeHall, Upper Saddle River, NJ.
linearTrendTestN
, linearTrendTestScaledMds
,
plotLinearTrendTestDesign
, lm
,
summary.lm
, kendallTrendTest
,
Power and Sample Size, Normal, t.test
.
# Look at how the power of the ttest for zero slope increases with increasing
# sample size:
seq(5, 30, by = 5)
#[1] 5 10 15 20 25 30
power < linearTrendTestPower(n = seq(5, 30, by = 5), slope.over.sigma = 0.1)
round(power, 2)
#[1] 0.06 0.13 0.34 0.68 0.93 1.00
#
# Repeat the last example, but compute the approximate power instead of the
# exact:
power < linearTrendTestPower(n = seq(5, 30, by = 5), slope.over.sigma = 0.1,
approx = TRUE)
round(power, 2)
#[1] 0.05 0.11 0.32 0.68 0.93 0.99
#
# Look at how the power of the ttest for zero slope increases with increasing
# scaled slope:
seq(0.05, 0.2, by = 0.05)
#[1] 0.05 0.10 0.15 0.20
power < linearTrendTestPower(15, slope.over.sigma = seq(0.05, 0.2, by = 0.05))
round(power, 2)
#[1] 0.12 0.34 0.64 0.87
#
# Look at how the power of the ttest for zero slope increases with increasing
# values of Type I error:
power < linearTrendTestPower(20, slope.over.sigma = 0.1,
alpha = c(0.001, 0.01, 0.05, 0.1))
round(power, 2)
#[1] 0.14 0.41 0.68 0.80
#
# Show that for a simple regression model, you get a greater power of detecting
# a nonzero slope if you take all the observations at two endpoints, rather than
# spreading the observations evenly between two endpoints.
# (Note: This design usually cannot work with environmental monitoring data taken
# over time since usually observations taken close together in time are not
# independent.)
linearTrendTestPower(x = 1:10, slope.over.sigma = 0.1)
#[1] 0.1265976
linearTrendTestPower(x = c(rep(1, 5), rep(10, 5)), slope.over.sigma = 0.1)
#[1] 0.2413823
#==========
# Clean up
#
rm(power)