R: Testing for existence of one breakpoint

pscore.test {segmented}

R Documentation

Testing for existence of one breakpoint

Description

Given a (generalized) linear model, the (pseudo) Score statistic tests for the existence of one breakpoint.

Usage

pscore.test(obj, seg.Z, k = 10, alternative = c("two.sided", "less", "greater"), 
    values=NULL, dispersion=NULL, df.t=NULL, more.break=FALSE, n.break=1, 
    only.term=FALSE, break.type=c("break","jump"))

Arguments

`obj`	a fitted model typically returned by `glm` or `lm`. Even an object returned by `segmented` can be set. Offset and weights are allowed.
`seg.Z`	a formula with no response variable, such as `seg.Z=~x1`, indicating the (continuous) segmented variable being tested. Only a single variable may be tested and an error is printed when `seg.Z` includes two or more terms. `seg.Z` can be omitted if i)`obj` is a segmented fit with a single segmented covariate (and that variable is taken), or ii)if it is a "lm" or "glm" fit with a single covariate (and that variable is taken).
`k`	optional. Number of points (equi-spaced from the min to max) used to compute the pseudo Score statistic. See Details.
`alternative`	a character string specifying the alternative hypothesis (relevant to the slope difference parameter).
`values`	optional. The evaluation points where the Score test is computed. See Details for default values.
`dispersion`	optional. the dispersion parameter for the family to be used to compute the test statistic. When `NULL` (the default), it is inferred from `obj`. Namely it is taken as `1` for the Binomial and Poisson families, and otherwise estimated by the residual Chi-squared statistic in the model `obj` (calculated from cases with non-zero weights divided by the residual degrees of freedom).
`df.t`	optional. The degress-of-freedom used to compute the p-value. When `NULL`, the df extracted from `obj` are used.
`more.break`	optional, logical. If `obj` is a 'segmented' fit, `more.break=FALSE` tests for the actual breakpoint for the variable 'seg.Z', while `more.break=TRUE` tests for an additional breakpoint(s) for the variable 'seg.Z'. Ignored when `obj` is not a segmented fit.
`n.break`	optional. Number of breakpoints postuled under the alternative hypothesis.
`only.term`	logical. If `TRUE`, only the pseudo covariate(s) relevant to the testing for the breakpoint is returned, and no test is computed.
`break.type`	The kind of breakpoint being tested. `"break"` is for piecewise-linear relationships, `"jump"` means piecewise-constant, i.e. a step-function, relationships.

Details

pscore.test tests for a non-zero difference-in-slope parameter of a segmented relationship. Namely, the null hypothesis is H_0:\beta=0, where \beta is the difference-in-slopes, i.e. the coefficient of the segmented function \beta(x-\psi)_+. The hypothesis of interest \beta=0 means no breakpoint. Simulation studies have shown that such Score test is more powerful than the Davies test (see reference) when the alternative hypothesis is ‘one changepoint’. If there are two or more breakpoints (for instance, a sinusoidal-like relationships), pscore.test can have lower power, and davies.test can perform better.

The dispersion value, if unspecified, is taken from obj. If obj represents the fit under the null hypothesis (no changepoint), the dispersion parameter estimate will be usually larger, leading to a (potentially severe) loss of power.

The k evaluation points are k equally spaced values in the range of the segmented covariate. k should not be small. Specific values can be set via values, although I have found no important difference due to number and location of the evaluation points, thus default is k=10 equally-spaced points. However, when the possible breakpoint is believed to lie into a specified narrower range, the user can specify k values in that range leading to higher power in detecting it, i.e. typically lower p-value.

If obj is a (segmented) lm object, the returned p-value comes from the t-distribution with appropriate degrees of freedom. Otherwise, namely if obj is a (segmented) glm object, the p-value is computed wrt the Normal distribution.

Value

A list with class 'htest' containing the following components:

`method`	title (character)
`data.name`	the regression model and the segmented variable being tested
`statistic`	the empirical value of the statistic
`parameter`	number of evaluation points
`p.value`	the p-value
`process`	the alternative hypothesis set

Author(s)

Vito M.R. Muggeo

References

Muggeo, V.M.R. (2016) Testing with a nuisance parameter present only under the alternative: a score-based approach with application to segmented modelling. J of Statistical Computation and Simulation, 86, 3059–3067.

Examples

## Not run: 
set.seed(20)
z<-runif(100)
x<-rnorm(100,2)
y<-2+10*pmax(z-.5,0)+rnorm(100,0,3)

o<-lm(y~z+x)

#testing for one changepoint
#use the simple null fit
pscore.test(o,~z) #compare with davies.test(o,~z)..

#use the segmented fit
os<-segmented(o, ~z)
pscore.test(os,~z) #smaller p-value, as it uses the dispersion under the alternative (from 'os') 

#test for the 2nd breakpoint in the variable z
pscore.test(os,~z, more.break=TRUE) 

  
## End(Not run)

[Package segmented version 2.1-1 Index]