ppr {stats} | R Documentation |
Projection Pursuit Regression
Description
Fit a projection pursuit regression model.
Usage
ppr(x, ...)
## S3 method for class 'formula'
ppr(formula, data, weights, subset, na.action,
contrasts = NULL, ..., model = FALSE)
## Default S3 method:
ppr(x, y, weights = rep(1, n),
ww = rep(1, q), nterms, max.terms = nterms, optlevel = 2,
sm.method = c("supsmu", "spline", "gcvspline"),
bass = 0, span = 0, df = 5, gcvpen = 1, trace = FALSE, ...)
Arguments
formula |
a formula specifying one or more numeric response variables and the explanatory variables. |
x |
numeric matrix of explanatory variables. Rows represent observations, and columns represent variables. Missing values are not accepted. |
y |
numeric matrix of response variables. Rows represent observations, and columns represent variables. Missing values are not accepted. |
nterms |
number of terms to include in the final model. |
data |
a data frame (or similar: see |
weights |
a vector of weights |
ww |
a vector of weights for each response, so the fit criterion is
the sum over case |
subset |
an index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.) |
na.action |
a function to specify the action to be taken if |
contrasts |
the contrasts to be used when any factor explanatory variables are coded. |
max.terms |
maximum number of terms to choose from when building the model. |
optlevel |
integer from 0 to 3 which determines the thoroughness of an optimization routine in the SMART program. See the ‘Details’ section. |
sm.method |
the method used for smoothing the ridge functions. The default is
to use Friedman's super smoother Can be abbreviated. |
bass |
super smoother bass tone control used with automatic span selection
(see |
span |
super smoother span control (see |
df |
if |
gcvpen |
if |
trace |
logical indicating if each spline fit should produce
diagnostic output (about |
... |
arguments to be passed to or from other methods. |
model |
logical. If true, the model frame is returned. |
Details
The basic method is given by Friedman (1984) and based on his code. This code has been shown to be extremely sensitive to the Fortran compiler used.
The algorithm first adds up to max.terms
ridge terms one at a
time; it will use less if it is unable to find a term to add that makes
sufficient difference. It then removes the least
important term at each step until nterms
terms
are left.
The levels of optimization (argument optlevel
)
differ in how thoroughly the models are refitted during this process.
At level 0 the existing ridge terms are not refitted. At level 1
the projection directions are not refitted, but the ridge
functions and the regression coefficients are.
Levels 2 and 3 refit all the terms and are equivalent for one
response; level 3 is more careful to re-balance the contributions
from each regressor at each step and so is a little less likely to
converge to a saddle point of the sum of squares criterion.
Value
A list with the following components, many of which are for use by the method functions.
call |
the matched call |
p |
the number of explanatory variables (after any coding) |
q |
the number of response variables |
mu |
the argument |
ml |
the argument |
gof |
the overall residual (weighted) sum of squares for the selected model |
gofn |
the overall residual (weighted) sum of squares against the
number of terms, up to |
df |
the argument |
edf |
if |
xnames |
the names of the explanatory variables |
ynames |
the names of the response variables |
alpha |
a matrix of the projection directions, with a column for each ridge term |
beta |
a matrix of the coefficients applied for each response to the ridge terms: the rows are the responses and the columns the ridge terms |
yb |
the weighted means of each response |
ys |
the overall scale factor used: internally the responses are
divided by |
fitted.values |
the fitted values, as a matrix if |
residuals |
the residuals, as a matrix if |
smod |
internal work array, which includes the ridge functions evaluated at the training set points. |
model |
(only if |
Source
Friedman (1984): converted to double precision and added interface to smoothing splines by B. D. Ripley, originally for the MASS package.
References
Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. Journal of the American Statistical Association, 76, 817–823. doi:10.2307/2287576.
Friedman, J. H. (1984). SMART User's Guide. Laboratory for Computational Statistics, Stanford University Technical Report No. 1.
Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S. Springer.
See Also
plot.ppr
, supsmu
, smooth.spline
Examples
require(graphics)
# Note: your numerical values may differ
attach(rock)
area1 <- area/10000; peri1 <- peri/10000
rock.ppr <- ppr(log(perm) ~ area1 + peri1 + shape,
data = rock, nterms = 2, max.terms = 5)
rock.ppr
# Call:
# ppr.formula(formula = log(perm) ~ area1 + peri1 + shape, data = rock,
# nterms = 2, max.terms = 5)
#
# Goodness of fit:
# 2 terms 3 terms 4 terms 5 terms
# 8.737806 5.289517 4.745799 4.490378
summary(rock.ppr)
# ..... (same as above)
# .....
#
# Projection direction vectors ('alpha'):
# term 1 term 2
# area1 0.34357179 0.37071027
# peri1 -0.93781471 -0.61923542
# shape 0.04961846 0.69218595
#
# Coefficients of ridge terms:
# term 1 term 2
# 1.6079271 0.5460971
par(mfrow = c(3,2)) # maybe: , pty = "s")
plot(rock.ppr, main = "ppr(log(perm)~ ., nterms=2, max.terms=5)")
plot(update(rock.ppr, bass = 5), main = "update(..., bass = 5)")
plot(update(rock.ppr, sm.method = "gcv", gcvpen = 2),
main = "update(..., sm.method=\"gcv\", gcvpen=2)")
cbind(perm = rock$perm, prediction = round(exp(predict(rock.ppr)), 1))
detach()