Fits a principal curve to m-dimensional data


A principal curve is a non-parametric generalisation of the principal component and is a curve that passes through the middle of a cloud of data points for a certain definition of ‘middle’.


prcurve(X, method = c("ca", "pca", "random", "user"), start = NULL,
        smoother = smoothSpline, complexity, vary = FALSE,
        maxComp, finalCV = FALSE, axis = 1, rank = FALSE,
        stretch = 2, maxit = 10, trace = FALSE, thresh = 0.001,
        plotit = FALSE, ...)

initCurve(X, method = c("ca", "pca", "random", "user"), rank = FALSE,
          axis = 1, start)



a matrix-like object containing the variables to which the principal curve is to be fitted.


character; method to use when initialising the principal curve. "ca" fits a correspondence analysis to X and uses the axis-th axis scores as the initial curve. "pca" does the same but fits a principal components analysis to X. "random" produces a random ordering as the initial curve.


numeric vector specifying the initial curve when method = "user". Must be of length nrow(X).


function; the choice of smoother used to fit the principal curve. Currently, the only options are smoothSpline, which is a wrapper to smooth.spline, and smoothGAM, which is a wrapper to gam.


numeric; the complexity of the fitted smooth functions.

The function passed as argument smoother should arrange for this argument to be passed on to relevant aspect of the underlying smoother. In the case of smoothSpline, complexity is the df argument of smooth.spline.


logical; should the complexity of the smoother fitted to each variable in X be allowed to vary (i.e. to allow a more or less smooth function for a particular variable. If FALSE the median complexity over all m variables is chosen as the fixed complexity for all m smooths.


numeric; the upper limt on the allowed complexity.


logial; should a final fit of the smooth function be performed using cross validation?


numeric; the ordinaion axis to use as the initial curve.


logical; should rank position on the gradient be used? Not yet implemented.


numeric; a factor by which the curve can be extrapolated when points are projected. Default is 2 (times the last segment length).


numeric; the maximum number of iterations.


logical; print progress on the iterations be printed to the console?


numeric; convergence threshold on shortest distances to the curve. The algorithm is considered to have converged when the latest iteration produces a total residual distance to the curve that is within thresh of the value obtained during the previous iteration.


logical; should the fitting process be plotted? If TRUE, then the fitted principal curve and observations in X are plotted in principal component space.


additional arguments are passed solely on to the function smoother.


An object of class "prcurve" with the following components:


a matrix corresponding to X, giving their projections onto the curve.


an index, such that s[tag, ] is smooth.


for each point, its arc-length from the beginning of the curve.


the sum-of-squared distances from the points to their projections.


logical; did the algorithm converge?


numeric; the number of iterations performed.


numeric; total sum-of-squared distances.


numeric vector; the complexity of the smoother fitted to each variable in X.


the matched call.


an object of class "rda", the result of a call to rda. This is a principal components analysis of the input data X.


a copy of the data used to fit the principal curve.


The fitting function uses function project_to_curve in package princurve to find the projection of the data on to the fitted curve.


Gavin L. Simpson

See Also

smoothGAM and smoothSpline for the wrappers fitting smooth functions to each variable.


## Load Abernethy Forest data set

## Remove the Depth and Age variables
abernethy2 <- abernethy[, -(37:38)]

## Fit the principal curve using the median complexity over
## all species
aber.pc <- prcurve(abernethy2, method = "ca", trace = TRUE,
                   vary = FALSE, penalty = 1.4)

## Extract fitted values
fit <- fitted(aber.pc) ## locations on curve
abun <- fitted(aber.pc, type = "smooths") ## fitted response

## Fit the principal curve using varying complexity of smoothers
## for each species
aber.pc2 <- prcurve(abernethy2, method = "ca", trace = TRUE,
                    vary = TRUE, penalty = 1.4)

## Predict new locations
take <- abernethy2[1:10, ]
pred <- predict(aber.pc2, take)

## Not run: 
## Fit principal curve using a GAM - currently slow ~10secs
aber.pc3 <- prcurve(abernethy2 / 100, method = "ca", trace = TRUE,
                    vary = TRUE, smoother = smoothGAM, bs = "cr", family = mgcv::betar())

## End(Not run)

