bigtps {bigsplines} | R Documentation |
Fits Cubic Thin-Plate Splines
Description
Given a real-valued response vector \mathbf{y}=\{y_{i}\}_{n\times1}
, a thin-plate spline model has the form
y_{i}=\eta(\mathbf{x}_{i})+e_{i}
where y_{i}
is the i
-th observation's respone, \mathbf{x}_{i}=(x_{i1},\ldots,x_{id})
is the i
-th observation's nonparametric predictor vector, \eta
is an unknown smooth function relating the response and predictor, and e_{i}\sim\mathrm{N}(0,\sigma^{2})
is iid Gaussian error. Function only fits interaction models.
Usage
bigtps(x,y,nknots=NULL,nvec=NULL,rparm=NA,
alpha=1,lambdas=NULL,se.fit=FALSE,
rseed=1234,knotcheck=TRUE)
Arguments
x |
Predictor vector or matrix with three or less columns. |
y |
Response vector. Must be same length as |
nknots |
Two possible options: (a) scalar giving total number of random knots to sample, or (b) vector indexing which rows of |
nvec |
Number of eigenvectors (and eigenvalues) to use in approximation. Must be less than or equal to the number of knots and greater than or equal to |
rparm |
Rounding parameter(s) for |
alpha |
Manual tuning parameter for GCV score. Using |
lambdas |
Vector of global smoothing parameters to try. Default estimates smoothing parameter that minimizes GCV score. |
se.fit |
Logical indicating if the standard errors of fitted values should be estimated. |
rseed |
Random seed for knot sampling. Input is ignored if |
knotcheck |
If |
Details
To estimate \eta
I minimize the penalized least-squares functional
\frac{1}{n}\sum_{i=1}^{n}(y_{i}-\eta(\mathbf{x}_{i}))^{2}+\lambda J(\eta)
where J(\eta)
is the thin-plate penalty (see Helwig and Ma) and \lambda\geq0
is a smoothing parameter that controls the trade-off between fitting and smoothing the data. Default use of the function estimates \lambda
by minimizing the GCV score (see bigspline
).
Using the rounding parameter input rparm
can greatly speed-up and stabilize the fitting for large samples. When rparm
is used, the spline is fit to a set of unique data points after rounding; the unique points are determined using the efficient algorithm described in Helwig (2013). Rounding parameter should be on the raw data scale.
Value
fitted.values |
Vector of fitted values corresponding to the original data points in |
se.fit |
Vector of standard errors of |
x |
Predictor vector (same as input). |
y |
Response vector (same as input). |
xunique |
Unique elements of |
yunique |
Mean of |
funique |
Vector giving frequency of each element of |
sigma |
Estimated error standard deviation, i.e., |
ndf |
Data frame with two elements: |
info |
Model fit information: vector containing the GCV, multiple R-squared, AIC, and BIC of fit model (assuming Gaussian error). |
myknots |
Spline knots used for fit. |
nvec |
Number of eigenvectors used for solution. |
rparm |
Rounding parameter for |
lambda |
Optimal smoothing parameter. |
coef |
Spline basis function coefficients. |
coef.csqrt |
Matrix square-root of covariace matrix of |
Warnings
Input nvec
must be greater than ncol(x)+1
.
When using rounding parameters, output fitted.values
corresponds to unique rounded predictor scores in output xunique
. Use predict.bigtps
function to get fitted values for full y
vector.
Computational Details
According to thin-plate spline theory, the function \eta
can be approximated as
\eta(x) = \sum_{k=1}^{M}d_{k}\phi_{k}(\mathbf{x}) + \sum_{h=1}^{q}c_{h}\xi(\mathbf{x},\mathbf{x}_{h}^{*})
where the \{\phi_{k}\}_{k=1}^{M}
are linear functions, \xi
is the thin-plate spline semi-kernel, \{\mathbf{x}_{h}^{*}\}_{h=1}^{q}
are the knots, and the c_{h}
coefficients are constrained to be orthongonal to the \{\phi_{k}\}_{k=1}^{M}
functions.
This implies that the penalized least-squares functional can be rewritten as
\|\mathbf{y} - \mathbf{K}\mathbf{d} - \mathbf{J}\mathbf{c}\|^{2} + n\lambda\mathbf{c}'\mathbf{Q}\mathbf{c}
where \mathbf{K}=\{\phi(\mathbf{x}_{i})\}_{n \times M}
is the null space basis function matrix, \mathbf{J}=\{\xi(\mathbf{x}_{i},\mathbf{x}_{h}^{*})\}_{n \times q}
is the contrast space basis funciton matrix, \mathbf{Q}=\{\xi(\mathbf{x}_{g}^{*},\mathbf{x}_{h}^{*})\}_{q \times q}
is the penalty matrix, and \mathbf{d}=(d_{0},\ldots,d_{M})'
and \mathbf{c}=(c_{1},\ldots,c_{q})'
are the unknown basis function coefficients, where \mathbf{c}
are constrained to be orthongonal to the \{\phi_{k}\}_{k=1}^{M}
functions.
See Helwig and Ma for specifics about how the constrained estimation is handled.
Note
The spline is estimated using penalized least-squares, which does not require the Gaussian error assumption. However, the spline inference information (e.g., standard errors and fit information) requires the Gaussian error assumption.
Author(s)
Nathaniel E. Helwig <helwig@umn.edu>
References
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.
Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.
Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.
Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.
Examples
########## EXAMPLE 1 ##########
# define relatively smooth function
set.seed(773)
myfun <- function(x){ sin(2*pi*x) }
x <- runif(500)
y <- myfun(x) + rnorm(500)
# fit thin-plate spline (default 1 dim: 30 knots)
tpsmod <- bigtps(x,y)
tpsmod
########## EXAMPLE 2 ##########
# define more jagged function
set.seed(773)
myfun <- function(x){ 2*x+cos(2*pi*x) }
x <- runif(500)*4
y <- myfun(x) + rnorm(500)
# try different numbers of knots
r1mod <- bigtps(x,y,nknots=20,rparm=0.01)
crossprod( myfun(r1mod$xunique) - r1mod$fitted )/length(r1mod$fitted)
r2mod <- bigtps(x,y,nknots=35,rparm=0.01)
crossprod( myfun(r2mod$xunique) - r2mod$fitted )/length(r2mod$fitted)
r3mod <- bigtps(x,y,nknots=50,rparm=0.01)
crossprod( myfun(r3mod$xunique) - r3mod$fitted )/length(r3mod$fitted)
########## EXAMPLE 3 ##########
# function with two continuous predictors
set.seed(773)
myfun <- function(x1v,x2v){
sin(2*pi*x1v) + log(x2v+.1) + cos(pi*(x1v-x2v))
}
x <- cbind(runif(500),runif(500))
y <- myfun(x[,1],x[,2]) + rnorm(500)
# fit thin-plate spline with 50 knots (default 2 dim: 100 knots)
tpsmod <- bigtps(x,y,nknots=50)
tpsmod
crossprod( myfun(x[,1],x[,2]) - tpsmod$fitted.values )/500
########## EXAMPLE 4 ##########
# function with three continuous predictors
set.seed(773)
myfun <- function(x1v,x2v,x3v){
sin(2*pi*x1v) + log(x2v+.1) + cos(pi*x3v)
}
x <- cbind(runif(500),runif(500),runif(500))
y <- myfun(x[,1],x[,2],x[,3]) + rnorm(500)
# fit thin-plate spline with 50 knots (default 3 dim: 200 knots)
tpsmod <- bigtps(x,y,nknots=50)
tpsmod
crossprod( myfun(x[,1],x[,2],x[,3]) - tpsmod$fitted.values )/500