sm {npreg}R Documentation

Fit a Smooth Model

Description

Fits a semi- or nonparametric regression model with the smoothing parameter(s) selected via one of eight methods: GCV, OCV, GACV, ACV, REML, ML, AIC, or BIC.

Usage

sm(formula, data, weights, types = NULL, tprk = TRUE, knots = NULL,
   skip.iter = TRUE, df, spar = NULL, lambda = NULL, control = list(),
   method = c("GCV", "OCV", "GACV", "ACV", "REML", "ML", "AIC", "BIC"),
   xrange = NULL, thetas = NULL, mf = NULL)

Arguments

formula

Object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. Uses the same syntax as lm and glm.

data

Optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which sm is called.

weights

Optional vector of weights to be used in the fitting process. If provided, weighted least squares is used. Defaults to all 1.

types

Named list giving the type of smooth to use for each predictor. If NULL, the type is inferred from the data. See "Types of Smooths" section for details.

tprk

Logical specifying how to parameterize smooth models with multiple predictors. If TRUE (default), a tensor product reproducing kernel function is used to represent the function. If FALSE, a tensor product of marginal kernel functions is used to represent the function. See the "Multiple Smooths" section for details.

knots

Spline knots for the estimation of the nonparametric effects. For models with multiple predictors, the knot specification will depend on the tprk input. See the "Choosing Knots" section for details

skip.iter

Set to FALSE for deep tuning of the hyperparameters. Only applicable when multiple smooth terms are included. See the "Parameter Tuning" section for details.

df

Equivalent degrees of freedom (trace of the smoother matrix). Must be in [m,n] where m is the number of columns of the null space basis function matrix X, and n is the number of observations. Will be approximate if skip.iter = FALSE.

spar

Smoothing parameter. Typically (but not always) in the range (0,1]. If specified lambda = 256^(3*(spar-1)).

lambda

Computational smoothing parameter. This value is weighted by n to form the penalty coefficient (see Details). Ignored if spar is provided.

control

Optional list with named components that control the optimization specs for the smoothing parameter selection routine.

Note that spar is only searched for in the interval [lower, upper].

lower:

lower bound for spar; defaults to -1.5

upper:

upper bound for spar; defaults to 1.5

tol:

the absolute precision (tolerance) used by optimize and nlm; defaults to 1e-8.

iterlim:

the iteration limit used by nlm; defaults to 5000.

print.level:

the print level used by nlm; defaults to 0 (no printing).

method

Method for selecting the smoothing parameter. Ignored if lambda is provided and skip.iter = TRUE.

xrange

Optional named list containing the range of each predictor. If NULL, the ranges are calculated from the input data.

thetas

Optional vector of hyperparameters to use for smoothing. If NULL, these are tuned using the requested method.

mf

Optional model frame constructed from formula and data (and potentially weights).

Note: the last two arguments are not intended to be called by the typical user of this function. These arguments are included primarily for internal usage by the boot.sm function.

Details

Letting f_i = f(x_i) with x_i = (x_{i1}, \ldots, x_{ip}), the function is represented as

f = X \beta + Z \alpha

where the basis functions in X span the null space (i.e., parametric effects), and Z contains the kernel function(s) of the contrast space (i.e., nonparametric effects) evaluated at all combinations of observed data points and knots. The vectors \beta and \alpha contain unknown basis function coefficients.

Letting M = (X, Z) and \gamma = (\beta', \alpha')', the penalized least squares problem has the form

(y - M \gamma)' W (y - M \gamma) + n \lambda \alpha' Q \alpha

where W is a diagonal matrix containg the weights, and Q is the penalty matrix. The optimal coefficients are the solution to

(M' W M + n \lambda P) \gamma = M' W y

where P is the penalty matrix Q augmented with zeros corresponding to the \beta in \gamma.

Value

An object of class "sm" with components:

fitted.values

the fitted values, i.e., predictions.

se.fit

the standard errors of the fitted values.

sse

the sum-of-squared errors.

cv.crit

the cross-validation criterion.

nsdf

the degrees of freedom (Df) for the null space.

df

the estimated degrees of freedom (Df) for the fit model.

df.residual

the residual degrees of freedom = nobs - df

r.squared

the observed coefficient of multiple determination.

sigma

the estimate of the error standard deviation.

logLik

the log-likelihood (if method is REML or ML).

aic

Akaike's Information Criterion (if method is AIC).

bic

Bayesian Information Criterion (if method is BIC).

spar

the value of spar computed or given, i.e., s = 1 + \log_{256}(\lambda)/3

lambda

the value of \lambda corresponding to spar, i.e., \lambda = 256^{3(s-1)}.

penalty

the smoothness penalty \alpha' Q \alpha.

coefficients

the basis function coefficients used for the fit model.

cov.sqrt

the square-root of the covariance matrix of coefficients. Note: tcrossprod(cov.sqrt) reconstructs the covariance matrix.

iter

the number of iterations used by nlm (if applicable).

specs

a list with information used for prediction purposes:

knots

the spline knots used for each predictor.

thetas

the "extra" tuning parameters used to weight the penalties.

xrng

the ranges of the predictor variables.

xlev

the factor levels of the predictor variables (if applicable).

tprk

logical controlling the formation of tensor product smooths.

skip.iter

logical controlling the parameter tuning (same as input).

control

the control options use for tuning.

data

the data used to fit the model.

types

the type of smooth used for each predictor.

terms

the terms included in the fit model.

method

the method used for smoothing parameter selection. Will be NULL if lambda was provided.

formula

the formula specifying the fit model.

weights

the weights used for fitting (if applicable)

call

the matched call.

Methods

The smoothing parameter can be selected using one of eight methods:
Generalized Cross-Validation (GCV)
Ordinary Cross-Validation (OCV)
Generalized Approximate Cross-Validation (GACV)
Approximate Cross-Validation (ACV)
Restricted Maximum Likelihood (REML)
Maximum Likelihood (ML)
Akaike's Information Criterion (AIC)
Bayesian Information Criterion (BIC)

Types of Smooths

The following codes specify the spline types:

par Parametric effect (factor, integer, or numeric).
ran Random effect/intercept (unordered factor).
nom Nominal smoothing spline (unordered factor).
ord Ordinal smoothing spline (ordered factor).
lin Linear smoothing spline (integer or numeric).
cub Cubic smoothing spline (integer or numeric).
qui Quintic smoothing spline (integer or numeric).
per Periodic smoothing spline (integer or numeric).
sph Spherical spline (matrix with d = 2 columns: lat, long).
tps Thin plate spline (matrix with d \ge 1 columns).

For finer control of some specialized spline types:

per.lin Linear periodic spline (m = 1).
per.cub Cubic periodic spline (m = 2).
per.qui Quintic periodic spline (m = 3).
sph.2 Linear spherical spline (m = 2).
sph.3 Cubic spherical spline (m = 3).
sph.4 Quintic spherical spline (m = 4).
tps.lin Linear thin plate spline (m = 1).
tps.cub Cubic thin plate spline (m = 2).
tps.qui Quintic thin plate spline (m = 3).

For details on the spline kernel functions, see basis.nom (nominal), basis.ord (ordinal), basis.poly (polynomial), basis.sph (spherical), and basis.tps (thin plate).

Note: "ran" is default for unordered factors when the number of levels is 20 or more, whereas "nom" is the default for unordered factors otherwise.

Choosing Knots

If tprk = TRUE, the four options for the knots input include:

1. a scalar giving the total number of knots to sample
2. a vector of integers indexing which rows of data are the knots
3. a list with named elements giving the marginal knot values for each predictor (to be combined via expand.grid)
4. a list with named elements giving the knot values for each predictor (requires the same number of knots for each predictor)

If tprk = FALSE, the three options for the knots input include:

1. a scalar giving the common number of knots for each continuous predictor
2. a list with named elements giving the number of marginal knots for each predictor
3. a list with named elements giving the marginal knot values for each predictor

Multiple Smooths

Suppose formula = y ~ x1 + x2 so that the model contains additive effects of two predictor variables.

The k-th predictor's marginal effect can be denoted as

f_k = X_k \beta_k + Z_k \alpha_k

where X_k is the n by m_k null space basis function matrix, and Z_k is the n by r_k contrast space basis function matrix.

If tprk = TRUE, the null space basis function matrix has the form X = [1, X_1, X_2] and the contrast space basis function matrix has the form

Z = \theta_1 Z_1 + \theta_2 Z_2

where the \theta_k are the "extra" smoothing parameters. Note that Z is of dimension n by r = r_1 = r_2.

If tprk = FALSE, the null space basis function matrix has the form X = [1, X_1, X_2], and the contrast space basis function matrix has the form

Z = [\theta_1 Z_1, \theta_2 Z_2]

where the \theta_k are the "extra" smoothing parameters. Note that Z is of dimension n by r = r_1 + r_2.

Parameter Tuning

When multiple smooth terms are included in the model, there are smoothing (hyper)parameters that weight the contribution of each combination of smooth terms. These hyperparameters are distinct from the overall smoothing parameter lambda that weights the contribution of the penalty.

skip.iter = TRUE (default) estimates the smoothing hyperparameters using Algorithm 3.2 of Gu and Wahba (1991), which typically provides adequate results when the model form is correctly specified. The lambda parameter is tuned via the specified smoothing parameter selection method.

skip.iter = FALSE uses Algorithm 3.2 as an initialization, and then the nlm function is used to tune the hyperparameters via the specified smoothing parameter selection method. Setting skip.iter = FALSE can (substantially) increase the model fitting time, but should produce better results—particularly if the model formula is misspecified.

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Berry, L. N., & Helwig, N. E. (2021). Cross-validation, information theory, or maximum likelihood? A comparison of tuning methods for penalized splines. Stats, 4(3), 701-724. doi:10.3390/stats4030042

Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik, 31, 377-403. doi:10.1007/BF01404567

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer. doi:10.1007/978-1-4614-5369-7

Gu, C. and Wahba, G. (1991). Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM Journal on Scientific and Statistical Computing, 12(2), 383-398. doi:10.1137/0912021

Helwig, N. E. (2020). Multiple and Generalized Nonparametric Regression. In P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, & R. A. Williams (Eds.), SAGE Research Methods Foundations. doi:10.4135/9781526421036885885

Helwig, N. E. (2021). Spectrally sparse nonparametric regression via elastic net regularized smoothers. Journal of Computational and Graphical Statistics, 30(1), 182-191. doi:10.1080/10618600.2020.1806855

See Also

Related Modeling Functions:

ss for fitting a smoothing spline with a single predictor (Gaussian response).

gsm for fitting generalized smooth models with multiple predictors of mixed types (non-Gaussian response).

S3 Methods and Related Functions for "sm" Objects:

boot.sm for bootstrapping sm objects.

coef.sm for extracting coefficients from sm objects.

cooks.distance.sm for calculating Cook's distances from sm objects.

cov.ratio for computing covariance ratio from sm objects.

deviance.sm for extracting deviance from sm objects.

dfbeta.sm for calculating DFBETA from sm objects.

dfbetas.sm for calculating DFBETAS from sm objects.

diagnostic.plots for plotting regression diagnostics from sm objects.

fitted.sm for extracting fitted values from sm objects.

hatvalues.sm for extracting leverages from sm objects.

model.matrix.sm for constructing model matrix from sm objects.

plot.sm for plotting effects from sm objects.

predict.sm for predicting from sm objects.

residuals.sm for extracting residuals from sm objects.

rstandard.sm for computing standardized residuals from sm objects.

rstudent.sm for computing studentized residuals from sm objects.

smooth.influence for calculating basic influence information from sm objects.

smooth.influence.measures for convenient display of influential observations from sm objects.

summary.sm for summarizing sm objects.

vcov.sm for extracting coefficient covariance matrix from sm objects.

weights.sm for extracting prior weights from sm objects.

Examples

##########   EXAMPLE 1   ##########
### 1 continuous predictor

# generate data
set.seed(1)
n <- 100
x <- seq(0, 1, length.out = n)
fx <- 2 + 3 * x + sin(2 * pi * x)
y <- fx + rnorm(n, sd = 0.5)

# fit sm with 10 knots (tprk = TRUE)
sm.ssa <- sm(y ~ x, knots = 10)

# fit sm with 10 knots (tprk = FALSE)
sm.gam <- sm(y ~ x, knots = 10, tprk = FALSE)

# print both results (note: they are identical)
sm.ssa
sm.gam

# plot both results (note: they are identical)
plot(sm.ssa)
plot(sm.gam)

# summarize both results (note: they are identical)
summary(sm.ssa)
summary(sm.gam)

# compare true MSE values (note: they are identical) 
mean( ( fx - sm.ssa$fit )^2 )
mean( ( fx - sm.gam$fit )^2 )



##########   EXAMPLE 2   ##########
### 1 continuous and 1 nominal predictor
### additive model

# generate data
set.seed(1)
n <- 100
x <- seq(0, 1, length.out = n)
z <- factor(sample(letters[1:3], size = n, replace = TRUE))
fun <- function(x, z){
  mu <- c(-2, 0, 2)
  zi <- as.integer(z)
  fx <- mu[zi] + 3 * x + sin(2 * pi * x)
}
fx <- fun(x, z)
y <- fx + rnorm(n, sd = 0.5)

# define marginal knots
probs <- seq(0, 0.9, by = 0.1)
knots <- list(x = quantile(x, probs = probs),
              z = letters[1:3])

# fit sm with specified knots (tprk = TRUE)
sm.ssa <- sm(y ~ x + z, knots = knots)

# fit sm with specified knots (tprk = FALSE)
sm.gam <- sm(y ~ x + z, knots = knots, tprk = FALSE)

# print both results (note: they are identical)
sm.ssa
sm.gam

# plot both results (note: they are identical)
plot(sm.ssa)
plot(sm.gam)

# summarize both results (note: they are almost identical)
summary(sm.ssa)
summary(sm.gam)

# compare true MSE values (note: they are identical) 
mean( ( fx - sm.ssa$fit )^2 )
mean( ( fx - sm.gam$fit )^2 )



##########   EXAMPLE 3   ##########
### 1 continuous and 1 nominal predictor
### interaction model

# generate data
set.seed(1)
n <- 100
x <- seq(0, 1, length.out = n)
z <- factor(sample(letters[1:3], size = n, replace = TRUE))
fun <- function(x, z){
  mu <- c(-2, 0, 2)
  zi <- as.integer(z)
  fx <- mu[zi] + 3 * x + sin(2 * pi * x + mu[zi]*pi/4)
}
fx <- fun(x, z)
y <- fx + rnorm(n, sd = 0.5)

# define marginal knots
probs <- seq(0, 0.9, by = 0.1)
knots <- list(x = quantile(x, probs = probs),
              z = letters[1:3])

# fit sm with specified knots (tprk = TRUE)
sm.ssa <- sm(y ~ x * z, knots = knots)

# fit sm with specified knots (tprk = FALSE)
sm.gam <- sm(y ~ x * z, knots = knots, tprk = FALSE)

# print both results (note: they are slightly different)
sm.ssa
sm.gam

# plot both results (note: they are slightly different)
plot(sm.ssa)
plot(sm.gam)

# summarize both results (note: they are slightly different)
summary(sm.ssa)
summary(sm.gam)

# compare true MSE values (note: they are slightly different) 
mean( ( fx - sm.ssa$fit )^2 )
mean( ( fx - sm.gam$fit )^2 )



##########   EXAMPLE 4   ##########
### 4 continuous predictors
### additive model

# generate data
set.seed(1)
n <- 100
fun <- function(x){
  sin(pi*x[,1]) + sin(2*pi*x[,2]) + sin(3*pi*x[,3]) + sin(4*pi*x[,4])
}
data <- as.data.frame(replicate(4, runif(n)))
colnames(data) <- c("x1v", "x2v", "x3v", "x4v")
fx <- fun(data)
y <- fx + rnorm(n)

# define marginal knots
knots <- list(x1v = quantile(data$x1v, probs = seq(0, 1, length.out = 10)),
              x2v = quantile(data$x2v, probs = seq(0, 1, length.out = 10)),
              x3v = quantile(data$x3v, probs = seq(0, 1, length.out = 10)),
              x4v = quantile(data$x4v, probs = seq(0, 1, length.out = 10)))
              
# define ssa knot indices
knots.indx <- c(bin.sample(data$x1v, nbin = 10, index.return = TRUE)$ix,
                bin.sample(data$x2v, nbin = 10, index.return = TRUE)$ix,
                bin.sample(data$x3v, nbin = 10, index.return = TRUE)$ix,
                bin.sample(data$x4v, nbin = 10, index.return = TRUE)$ix)

# fit sm with specified knots (tprk = TRUE)
sm.ssa <- sm(y ~ x1v + x2v + x3v + x4v, data = data, knots = knots.indx)

# fit sm with specified knots (tprk = FALSE)
sm.gam <- sm(y ~ x1v + x2v + x3v + x4v, data = data, knots = knots, tprk = FALSE)

# print both results (note: they are slightly different)
sm.ssa
sm.gam

# plot both results (note: they are slightly different)
plot(sm.ssa)
plot(sm.gam)

# summarize both results (note: they are slightly different)
summary(sm.ssa)
summary(sm.gam)

# compare true MSE values (note: they are slightly different) 
mean( ( fx - sm.ssa$fit )^2 )
mean( ( fx - sm.gam$fit )^2 )


[Package npreg version 1.1.0 Index]