R: Principal coordinate ridge regression

smooth.construct.pco.smooth.spec {refund}

R Documentation

Principal coordinate ridge regression

Description

Smooth constructor function for principal coordinate ridge regression fitted by gam. When the principal coordinates are defined by a relevant distance among functional predictors, this is a form of nonparametric scalar-on-function regression. Reiss et al. (2016) describe the approach and apply it to dynamic time warping distances among functional predictors.

Usage

## S3 method for class 'pco.smooth.spec'
smooth.construct(object, data, knots)

Arguments

`object`	a smooth specification object, usually generated by a term of the form `s(dummy, bs="pco", k, xt)`; see Details.
`data`	a list containing just the data.
`knots`	IGNORED!

Value

An object of class pco.smooth. The resulting object has an xt element which contains details of the multidimensional scaling, which may be interesting.

Details

The constructor is not normally called directly, but is rather used internally by gam.

In a gam term of the above form s(dummy, bs="pco", k, xt),

dummy is an arbitrary vector (or name of a column in data) whose length is the number of observations. This is not actually used, but is required as part of the input to s. Note that if multiple pco terms are used in the model, there must be multiple unique term names (e.g., "dummy1", "dummy2", etc).
k is the number of principal coordinates (e.g., k=9 will give a 9-dimensional projection of the data).
xt is a list supplying the distance information, in one of two ways. (i) A matrix Dmat of distances can be supplied directly via xt=list(D=Dmat,...). (ii) Alternatively, one can use xt=list(realdata=..., dist_fn=..., ...) to specify a data matrix realdata and distance function dist_fn, whereupon a distance matrix dist_fn(realdata) is created.

The list xt also has the following optional elements:

add: Passed to cmdscale when performing multidimensional scaling; for details, see the help for that function. (Default FALSE.)
fastcmd: if TRUE, multidimensional scaling is performed by cmdscale_lanczos, which uses Lanczos iteration to eigendecompose the distance matrix; if FALSE, MDS is carried out by cmdscale. Default is FALSE, to use cmdscale.

Author(s)

David L Miller, based on code from Lan Huo and Phil Reiss

References

Reiss, P. T., Miller, D. L., Wu, P.-S., and Wen-Yu Hua, W.-Y. Penalized nonparametric scalar-on-function regression via principal coordinates. Under revision. Available at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5714326/.

Examples

## Not run: 
# a simulated example
library(refund)
library(mgcv)
require(dtw)

## First generate the data
Xnl <- matrix(0, 30, 101)
set.seed(813)
tt <- sort(sample(1:90, 30))
for(i in 1:30){
  Xnl[i, tt[i]:(tt[i]+4)] <- -1
  Xnl[i, (tt[i]+5):(tt[i]+9)] <- 1
}
X.toy <- Xnl + matrix(rnorm(30*101, ,0.05), 30)
y.toy <- tt + rnorm(30, 0.05)
y.rainbow <- rainbow(30, end=0.9)[(y.toy-min(y.toy))/
                                   diff(range(y.toy))*29+1]

## Display the toy data
par(mfrow=c(2, 2))
matplot((0:100)/100, t(Xnl[c(4, 25), ]), type="l", xlab="t", ylab="",
        ylim=range(X.toy), main="Noiseless functions")
matplot((0:100)/100, t(X.toy[c(4, 25), ]), type="l", xlab="t", ylab="",
        ylim=range(X.toy), main="Observed functions")
matplot((0:100)/100, t(X.toy), type="l", lty=1, col=y.rainbow, xlab="t",
        ylab="", main="Rainbow plot")

## Obtain DTW distances
D.dtw <- dist(X.toy, method="dtw", window.type="sakoechiba", window.size=5)

## Compare PC vs. PCo ridge regression

# matrix to store results
GCVmat <- matrix(NA, 15, 2)
# dummy response variable
dummy <- rep(1,30)

# loop over possible projection dimensions
for (k. in 1:15){
  # fit PC (m1) and PCo (m2) ridge regression
  m1 <- gam(y.toy ~ s(dummy, bs="pco", k=k.,
            xt=list(realdata=X.toy, dist_fn=dist)), method="REML")
  m2 <- gam(y.toy ~ s(dummy, bs="pco", k=k., xt=list(D=D.dtw)), method="REML")
  # calculate and store GCV scores
  GCVmat[k., ] <- length(y.toy) * c(sum(m1$residuals^2)/m1$df.residual^2,
                   sum(m2$residuals^2)/m2$df.residual^2)
}

## plot the GCV scores per dimension for each model
matplot(GCVmat, lty=1:2, col=1, pch=16:17, type="o", ylab="GCV",
        xlab="Number of principal components / coordinates",
        main="GCV score")
legend("right", c("PC ridge regression", "DTW-based PCoRR"), lty=1:2, pch=16:17)

## example of making a prediction

# fit a model to the toy data
m <- gam(y.toy ~ s(dummy, bs="pco", k=2, xt=list(D=D.dtw)), method="REML")

# first build the distance matrix
# in this case we just subsample the original matrix
# see ?pco_predict_preprocess for more information on formatting this data
dist_list <- list(dummy = as.matrix(D.dtw)[, c(1:5,10:15)])

# preprocess the prediction data
pred_data <- pco_predict_preprocess(m, newdata=NULL, dist_list)

# make the prediction
p <- predict(m, pred_data)

# check that these are the same as the corresponding fitted values
print(cbind(fitted(m)[ c(1:5,10:15)],p))


## End(Not run)

[Package refund version 0.1-35 Index]