R: Specifying a smooth term in the gcrq formula.

ps {quantregGrowth}

R Documentation

Specifying a smooth term in the gcrq formula.

Description

Function used to define the smooth term (via P-splines) within the gcrq formula. The function actually does not evaluate a (spline) smooth, but simply it passes relevant information to proper fitter functions.

Usage

ps(..., lambda = -1, d = 3, by=NULL, ndx = NULL, deg = 3, knots=NULL,
    monotone = 0, concave = 0, var.pen = NULL, pen.matrix=NULL, dropc=TRUE, 
    center=TRUE, K=NULL, decom=FALSE, constr.fit=TRUE, shared.pen=FALSE, 
    st=FALSE, ad=0)

Arguments

`...`	The covariate supposed to have a nonlinear relationship with the quantile curve(s) being estimated. A B-spline is built, and a (difference) penalty is applied. In growth charts this variable is typically the age. If the covariate is a factor, category-specific coefficients are estimated subject to a lasso penalty. See the last example in ?gcrq. A matrix of (continuous) covariates can be also supplied to perfom variable selection (among its columns).
`lambda`	A supplied smoothing parameter for the smooth term. If it is negative scalar, the smoothing parameter is estimated iteratively as discussed in Muggeo et al. (2021). If a positive scalar, it represents the actual smoothing parameter. If it is a vector, cross validation is performed to select the ‘best’ value. See Details in `gcrq`.
`d`	The difference order of the penalty. Default to 3 Ignored if `pen.matrix` is supplied.
`by`	if different from `NULL`, a numeric or factor variable of the same dimension as the covariate in `...` If numeric the elements multiply the smooth (i.e. a varying coefficient model); if factor, a smooth is fitted for each factor level. Usually the variable `by` is also included as main effect in the formula, see examples in `gcrq`. When `by` includes a factor, the formula should include the model intecept, i.e. `y~g+ps(x,by=g)` and not `y~ 0+g+ps(x,by=g)`.
`ndx`	The number of intervals of the covariate range used to build the B-spline basis. Non-integer values are rounded by `round()`. If `NULL`, default, it is taken `min(n/4,9)` (versions <=1.1-0 it was `min(n/4,40)`, the empirical rule of Ruppert). It could be reduced further (but no less than 5 or 6, say) if the sample size is not large and the default value leads to some error in the fitting procedure, see section `Note` in `gcrq`. Likewise, if the underlying relationship is strongly nonlinear, `ndx` could be increased. The returned basis wil have '`ndx`+`deg`-`1`' (if `dropc=TRUE`) basis functions.
`deg`	The degree of the spline polynomial. Default to 3. The B-spline basis is composed by `ndx`+`deg` basis functions and if `dropc=TRUE` the first column is removed for identifiability (and the model intercept is estimated without any penalty).
`knots`	The knots locations. If `NULL`, equispaced knots are set. Note if predictions outside the observed covariate range have to be computed (via `predict.gcrq`), the knots should be set enought outside the observed range.
`monotone`	Numeric value to set up monotonicity restrictions on the first derivative of fitted smooth function '0' = no constraint (default); '1' = non-decreasing smooth function; '-1' = non-increasing smooth function.
`concave`	Numeric value to set up monotonicity restrictions on the second derivative of fitted smooth function '0' = no constraint (default); '1' = concave smooth function; '-1' = convex smooth function.
`var.pen`	A character indicating the varying penalty. See Details.
`pen.matrix`	if provided, a penalty matrix `A`, say, such that the penalty in the objective function, apart from the smoothing parameter, is `\|\|Ab\|\|_1` where `b` is the spline coefficient vector being penalized.
`dropc`	logical. Should the first column of the B-spline basis be dropped for the basis identifiability? Default to `TRUE`. Note, if `dropc=FALSE` is set, it is necessary to omit the model intercept AND not to center the basis, i.e. `center=FALSE`. Alternatively, both a full basis and the model intercept may be included by adding a small ridge penalty via `lambda.ridge>0`.
`center`	logical. If `TRUE` the smooth effects are 'centered' over the covariate values, i.e. `\sum_i \hat{f}(x_i)=0`.
`K`	A scalar tuning the selection of wiggliness of the smoothed curve when `\lambda` has to be estimated (i.e. `lambda<0` is set). The larger `K`, the smoother the curve. Simulations suggest `K=2` for the smoothing, and `K=log(n/p^(2/3))` for variable selection and random intercepts (`p` is the number of variables or number of subjects). See details.
`decom`	logical. If `TRUE`, the B-spline `B` (with a `d` order difference penalty) is decomposed into truncated power functions namely unpenalized polynomial terms up to degree d-1, and additional terms `Z= B D'(DD')^{-1}`. Only the coefficients of `Z` are penalized via an identity matrix, i.e. a lasso penalty. Currently `decom=TRUE` does not work with shape (monotonicity and concavity) restrictions and noncrossing constraints.
`constr.fit`	logical. If `monotone` or `concave` are different from 0, `constr.fit=TRUE` means that these constraints are set on the fitted quantiles rather than on the spline coefficients.
`shared.pen`	logical. If `TRUE` and the smooth is a VC term with a factor specified in `by`, the smooths in each level of the factor share the same smoothing parameter.
`st`	logical. If `TRUE` the variable(s) are standardized via the `scale()` function. Typically used for variable selection via lasso, i.e. when a matrix of covariates is passed in `ps()`.
`ad`	a positive number to carry out a form of adaptive lasso. More specifically, at each step of the iterative algorithm, the penalty is `\lambda\sum_jw_j\|\beta_j\|` where `w_j=\|\tilde{\beta}_j\|^\mathtt{-ad}` and `\tilde{\beta}_j` are estimates coming from the previous iteration with a different value of `\lambda`. `ad=0` means the standard lasso and `ad=1` adaptive lasso (with weights updated during the iterative process.

Details

If a numeric variable has been supplied, ps() builds a B-spline basis with number of columns equal to ndx+deg (or length(knots)-deg-1). However, unless dropc=FALSE is specified, the first column is removed for identifiability, and the spline coefficients are penalized via differences of order d; d=0 leads to a penalty on the coefficients themselves. If pen.matrix is supplied, d is ignored. Since versions 1.5-0 and 1.6-0, a factor or matrix can be supplied.

lambda is the tuning parameter, fixed or to be estimated. When lambda=0 an unpenalized (and typically wiggly) fit is obtained, and as lambda increases the curve gets smoother till a d-1 degree polynomial. At 'intermediate' lambda values, the fitted curve is a piecewise polynomial of degree d-1.

It is also possible to put a varying penalty via the argument var.pen. Namely for a constant smoothing (var.pen=NULL) the penalty is \lambda\sum_k |\Delta^d_k| where \Delta^d_k is the k-th difference (of order d) of the spline coefficients. For instance if d=1, |\Delta^1_k|=|b_k-b_{k-1}| where the b_k are the spline coefficients. When a varying penalty is set, the penalty becomes \lambda\sum_k |\Delta_k^d| w_k where the weights w_k depend on var.pen; for instance var.pen="((1:k)^2)" results in w_k=k^2. See models m6 and m6a in the examples of gcrq.

If decom=TRUE, the smooth can be plotted with or without the fixed part, see overall.eff in the function plot.gcrq.

Value

The function simply returns the covariate with added attributes relevant to smooth term.

Note

For shape-constrained fits, use constr.fit=FALSE only if you are using a single full and uncentred basis, namely something like
gcrq(y~0+ps(x, center=FALSE, dropc=FALSE, monotone=1, constr.fit=FALSE),..).

Author(s)

Vito M. R. Muggeo

References

Muggeo VMR, Torretta F, Eilers PHC, Sciandra M, Attanasio M (2021). Multiple smoothing parameters selection in additive regression quantiles, Statistical Modelling, 21, 428-448.

For a general discussion on using B-spline and penalties in regression model see

Eilers PHC, Marx BD. (1996) Flexible smoothing with B-splines and penalties. Statistical Sciences, 11:89-121.

Examples

##see ?gcrq

##gcrq(y ~ ps(x),..) #it works (default: center = TRUE, dropc = TRUE)
##gcrq(y ~ 0 + ps(x, center = TRUE, dropc = FALSE)) #it does NOT work
##gcrq(y ~ 0 + ps(x, center = FALSE, dropc = FALSE)) #it works

[Package quantregGrowth version 1.7-1 Index]