cs {gamlss}R Documentation

Specify a Smoothing Cubic Spline Fit in a GAMLSS Formula

Description

The functions cs() and scs() are using the cubic smoothing splines function smooth.spline() to do smoothing. They take a vector and return it with several attributes. The vector is used in the construction of the model matrix. The functions do not do the smoothing, but assigns the attributes to the vector to aid gamlss in the smoothing. The function doing the smoothing is gamlss.cs(). This function use the R function smooth.spline() which is then used by the backfitting function additive.fit() which is based on the original GAM implementation described in Chambers and Hastie (1992). The function gamlss.scs() differs from the function cs() in that allows cross validation of the smoothing parameters unlike the cs() which fixes the effective degrees of freedom, df. Note that the recommended smoothing function is now the function pb() which allows the estimation of the smoothing parameters using a local maximum likelihood. The function pb() is based on the penalised beta splines (P-splines) of Eilers and Marx (1996).

The (experimental) function vc is now defunct. For fitting varying coefficient models, Hastie and Tibshirani (1993) use the function pvc().

Usage

cs(x, df = 3, spar = NULL, c.spar = NULL, control = cs.control(...), ...)
scs(x, df = NULL, spar = NULL, control = cs.control(...), ...)
cs.control(cv = FALSE, all.knots = TRUE, nknots = NULL, keep.data = TRUE,
               df.offset = 0, penalty = 1.4, control.spar = list(), ...)

Arguments

x

the univariate predictor, (or expression, that evaluates to a numeric vector). For the function vc the x argument is the vector which has its (linear) coefficient change with r

df

the desired equivalent number of degrees of freedom (trace of the smoother matrix minus two for the constant and linear fit). The real smoothing parameter (spar below) is found such that df=tr(S)-2, where S is the implicit smoother matrix. Values for df should be greater than 0, with 0 implying a linear fit.

spar

smoothing parameter, typically (but not necessarily) in (0,1]. The coefficient lambda of the integral of the squared second derivative in the fit (penalised log likelihood) criterion is a monotone function of ‘spar’, see the details in smooth.spline.

c.spar

This is an option to be used when the degrees of freedom of the fitted gamlss object are different from the ones given as input in the option df. The default values used are the ones given the option control.spar in the R function smooth.spine() and they are c.spar=c(-1.5, 2). For very large data sets e.g. 10000 observations, the upper limit may have to increase for example to c.spar=c(-1.5, 2.5). Use this option if you have received the warning 'The output df are different from the input, change the control.spar'. c.spar can take both vectors or lists of length 2, for example c.spar=c(-1.5, 2.5) or c.spar=list(-1.5, 2.5) would have the same effect.

control

control for the function smooth.spline(), see below

cv

see the R function smooth.spline()

all.knots

see the R function smooth.spline()

nknots

see the R function smooth.spline()

keep.data

see the R function smooth.spline()

df.offset

see the R function smooth.spline()

penalty

see the R function smooth.spline(), here the default value is 1.4

control.spar

see above c.spar or the equivalent argument in the function smooth.spline

...

for extra arguments

Details

Note that cs itself does no smoothing; it simply sets things up for the function gamlss() which in turn uses the function additive.fit() for backfitting which in turn uses gamlss.cs()

Note that cs() and scs() functions behave differently at their default values that is if df and lambda are not specified. cs(x) by default will use 3 extra degrees of freedom for smoothing for x. scs(x) by default will estimate lambda (and the degrees of freedom) automatically using generalised cross validation (GCV). Note that if GCV is used the convergence of the gamlss model can be less stable compared to a model where the degrees of freedom are fixed. This will be true for small data sets.

Value

the vector x is returned, endowed with a number of attributes. The vector itself is used in the construction of the model matrix, while the attributes are needed for the backfitting algorithms additive.fit(). Since smoothing splines includes linear fits, the linear part will be efficiently computed with the other parametric linear parts of the model.

Warning

For a user who wishes to compare the gamlss() results with the equivalent gam() results in S-plus: make sure when using S-plus that the convergence criteria epsilon and bf.epsilon in control.gam() are decreased sufficiently to ensure proper convergence in S-plus. Also note that the degrees of freedom are defined on top of the linear term in gamlss, but on top of the constant term in S-plus, (so use an extra degrees of freedom in S-plus in order to obtain comparable results to those in galmss).

Change the upper limit of spar if you received the warning 'The output df are different from the input, change the control.spar'.

For large data sets do not use expressions, e.g. cs(x^0.5) inside the gamlss function command but evaluate the expression, e.g. nx=x^0.5, first and then use cs(nx).

Note

The degrees of freedom df are defined differently from that of the gam() function in S-plus. Here df are the additional degrees of freedom excluding the constant and the linear part of x. For example df=4 in gamlss() is equivalent to df=5 in gam() in S-plus

Author(s)

Mikis Stasinopoulos and Bob Rigby (see also the documentation of the functionsmooth.spline() for the original authors of the cubic spline function.)

References

Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S, Wadsworth & Brooks/Cole.

Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties (with comments and rejoinder). Statist. Sci, 11, 89-121.

Hastie, T. J. and Tibshirani, R. J. (1993), Varying coefficient models (with discussion),J. R. Statist. Soc. B., 55, 757-796.

Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape,(with discussion), Appl. Statist., 54, part 3, pp 507-554.

Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019) Distributions for modeling location, scale, and shape: Using GAMLSS in R, Chapman and Hall/CRC. An older version can be found in https://www.gamlss.com/.

Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, https://www.jstatsoft.org/v23/i07/.

Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017) Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC.

(see also https://www.gamlss.com/).

See Also

gamlss, gamlss.cs, pb, pvc

Examples

# cubic splines example
data(aids)
# fitting a smoothing cubic spline with 7 degrees of freedom
# plus the a quarterly  effect  
aids1<-gamlss(y~cs(x,df=7)+qrt,data=aids,family=PO) # 
aids2<-gamlss(y~scs(x,df=5)+qrt,data=aids,family=PO) # 
aids3<-gamlss(y~scs(x)+qrt,data=aids,family=PO) # using GCV 
with(aids, plot(x,y))
lines(aids$x,fitted(aids1), col="red")
lines(aids$x,fitted(aids3), col="green")
rm(aids1, aids2, aids3)

[Package gamlss version 5.4-22 Index]