| bigtps {bigsplines} | R Documentation |
Fits Cubic Thin-Plate Splines
Description
Given a real-valued response vector \mathbf{y}=\{y_{i}\}_{n\times1}, a thin-plate spline model has the form
y_{i}=\eta(\mathbf{x}_{i})+e_{i}
where y_{i} is the i-th observation's respone, \mathbf{x}_{i}=(x_{i1},\ldots,x_{id}) is the i-th observation's nonparametric predictor vector, \eta is an unknown smooth function relating the response and predictor, and e_{i}\sim\mathrm{N}(0,\sigma^{2}) is iid Gaussian error. Function only fits interaction models.
Usage
bigtps(x,y,nknots=NULL,nvec=NULL,rparm=NA,
alpha=1,lambdas=NULL,se.fit=FALSE,
rseed=1234,knotcheck=TRUE)
Arguments
x |
Predictor vector or matrix with three or less columns. |
y |
Response vector. Must be same length as |
nknots |
Two possible options: (a) scalar giving total number of random knots to sample, or (b) vector indexing which rows of |
nvec |
Number of eigenvectors (and eigenvalues) to use in approximation. Must be less than or equal to the number of knots and greater than or equal to |
rparm |
Rounding parameter(s) for |
alpha |
Manual tuning parameter for GCV score. Using |
lambdas |
Vector of global smoothing parameters to try. Default estimates smoothing parameter that minimizes GCV score. |
se.fit |
Logical indicating if the standard errors of fitted values should be estimated. |
rseed |
Random seed for knot sampling. Input is ignored if |
knotcheck |
If |
Details
To estimate \eta I minimize the penalized least-squares functional
\frac{1}{n}\sum_{i=1}^{n}(y_{i}-\eta(\mathbf{x}_{i}))^{2}+\lambda J(\eta)
where J(\eta) is the thin-plate penalty (see Helwig and Ma) and \lambda\geq0 is a smoothing parameter that controls the trade-off between fitting and smoothing the data. Default use of the function estimates \lambda by minimizing the GCV score (see bigspline).
Using the rounding parameter input rparm can greatly speed-up and stabilize the fitting for large samples. When rparm is used, the spline is fit to a set of unique data points after rounding; the unique points are determined using the efficient algorithm described in Helwig (2013). Rounding parameter should be on the raw data scale.
Value
fitted.values |
Vector of fitted values corresponding to the original data points in |
se.fit |
Vector of standard errors of |
x |
Predictor vector (same as input). |
y |
Response vector (same as input). |
xunique |
Unique elements of |
yunique |
Mean of |
funique |
Vector giving frequency of each element of |
sigma |
Estimated error standard deviation, i.e., |
ndf |
Data frame with two elements: |
info |
Model fit information: vector containing the GCV, multiple R-squared, AIC, and BIC of fit model (assuming Gaussian error). |
myknots |
Spline knots used for fit. |
nvec |
Number of eigenvectors used for solution. |
rparm |
Rounding parameter for |
lambda |
Optimal smoothing parameter. |
coef |
Spline basis function coefficients. |
coef.csqrt |
Matrix square-root of covariace matrix of |
Warnings
Input nvec must be greater than ncol(x)+1.
When using rounding parameters, output fitted.values corresponds to unique rounded predictor scores in output xunique. Use predict.bigtps function to get fitted values for full y vector.
Computational Details
According to thin-plate spline theory, the function \eta can be approximated as
\eta(x) = \sum_{k=1}^{M}d_{k}\phi_{k}(\mathbf{x}) + \sum_{h=1}^{q}c_{h}\xi(\mathbf{x},\mathbf{x}_{h}^{*})
where the \{\phi_{k}\}_{k=1}^{M} are linear functions, \xi is the thin-plate spline semi-kernel, \{\mathbf{x}_{h}^{*}\}_{h=1}^{q} are the knots, and the c_{h} coefficients are constrained to be orthongonal to the \{\phi_{k}\}_{k=1}^{M} functions.
This implies that the penalized least-squares functional can be rewritten as
\|\mathbf{y} - \mathbf{K}\mathbf{d} - \mathbf{J}\mathbf{c}\|^{2} + n\lambda\mathbf{c}'\mathbf{Q}\mathbf{c}
where \mathbf{K}=\{\phi(\mathbf{x}_{i})\}_{n \times M} is the null space basis function matrix, \mathbf{J}=\{\xi(\mathbf{x}_{i},\mathbf{x}_{h}^{*})\}_{n \times q} is the contrast space basis funciton matrix, \mathbf{Q}=\{\xi(\mathbf{x}_{g}^{*},\mathbf{x}_{h}^{*})\}_{q \times q} is the penalty matrix, and \mathbf{d}=(d_{0},\ldots,d_{M})' and \mathbf{c}=(c_{1},\ldots,c_{q})' are the unknown basis function coefficients, where \mathbf{c} are constrained to be orthongonal to the \{\phi_{k}\}_{k=1}^{M} functions.
See Helwig and Ma for specifics about how the constrained estimation is handled.
Note
The spline is estimated using penalized least-squares, which does not require the Gaussian error assumption. However, the spline inference information (e.g., standard errors and fit information) requires the Gaussian error assumption.
Author(s)
Nathaniel E. Helwig <helwig@umn.edu>
References
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
Examples
########## EXAMPLE 1 ##########
# define relatively smooth function
set.seed(773)
myfun <- function(x){ sin(2*pi*x) }
x <- runif(500)
y <- myfun(x) + rnorm(500)
# fit thin-plate spline (default 1 dim: 30 knots)
tpsmod <- bigtps(x,y)
tpsmod
########## EXAMPLE 2 ##########
# define more jagged function
set.seed(773)
myfun <- function(x){ 2*x+cos(2*pi*x) }
x <- runif(500)*4
y <- myfun(x) + rnorm(500)
# try different numbers of knots
r1mod <- bigtps(x,y,nknots=20,rparm=0.01)
crossprod( myfun(r1mod$xunique) - r1mod$fitted )/length(r1mod$fitted)
r2mod <- bigtps(x,y,nknots=35,rparm=0.01)
crossprod( myfun(r2mod$xunique) - r2mod$fitted )/length(r2mod$fitted)
r3mod <- bigtps(x,y,nknots=50,rparm=0.01)
crossprod( myfun(r3mod$xunique) - r3mod$fitted )/length(r3mod$fitted)
########## EXAMPLE 3 ##########
# function with two continuous predictors
set.seed(773)
myfun <- function(x1v,x2v){
sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v))
}
x <- cbind(runif(500),runif(500))
y <- myfun(x[,1],x[,2]) + rnorm(500)
# fit thin-plate spline with 50 knots (default 2 dim: 100 knots)
tpsmod <- bigtps(x,y,nknots=50)
tpsmod
crossprod( myfun(x[,1],x[,2]) - tpsmod$fitted.values )/500
########## EXAMPLE 4 ##########
# function with three continuous predictors
set.seed(773)
myfun <- function(x1v,x2v,x3v){
sin(2*pi*x1v) + log(x2v+.1) + cos(pi*x3v)
}
x <- cbind(runif(500),runif(500),runif(500))
y <- myfun(x[,1],x[,2],x[,3]) + rnorm(500)
# fit thin-plate spline with 50 knots (default 3 dim: 200 knots)
tpsmod <- bigtps(x,y,nknots=50)
tpsmod
crossprod( myfun(x[,1],x[,2],x[,3]) - tpsmod$fitted.values )/500