regFit {fRegression} | R Documentation |
Regression Modelling
Description
Estimates the parameters of a regression model.
Usage
regFit(formula, data, family = gaussian,
use = c("lm", "rlm", "glm", "gam", "ppr", "nnet", "polymars"),
title = NULL, description = NULL, ...)
Arguments
data |
|
description |
a brief description of the project of type character. |
family |
a description of the error distribution and link function to be
used in |
formula |
a symbolic description of the model to be fit.
|
use |
denotes the regression method by a character string used to fit
the model.
|
title |
a character string which allows for a project title. |
... |
additional optional arguments to be passed to the underlying
functions. For details we refer to inspect the following help
pages: |
Details
The function regFit
was created to provide a selection of
regression models working together with Rmetrics' "timeSeries"
objects and providing a common S4 object as the returned value. These
models include linear modeling, robust linear modeling, generalized
linear modeling, generalized additive modelling, projection pursuit
regression, neural networks, and polychotomous MARS models.
LM – Linear Modelling:
Univariate linear regression analysis is a statistical methodology
that assumes a linear relationship between some predictor variables
and a response variable. The goal is to estimate the coefficients
and to predict new data from the estimated linear relationship.
R's base function
lm(formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
contrasts = NULL, offset, ...)
is used to fit linear models. It can be used to carry out regression,
single stratum analysis of variance and analysis of covariance, although
aov
may provide a more convenient interface for these.
Rmetrics' function
regFit(formula, data, use = "lm", ...)
calls R's base function lm
but with the difference that the
data
argument, may be any rectangular object which can be
transferred by the function as.data.frame
into a data frame
with named columns, e.g. an object of class "timeSeries"
.
The function regFit
returns an S4 object of class "fREG"
whose slot @fit
is the object as returned by the function
"lm"
. In addition we have S4 methods fitted
and
residuals
which allow to retrieve the fitted values and the
residuals as objects of same class as defined by the argument
data
.
The function plot.lm
provides four plots: a plot of residuals
against fitted values, a Scale-Location plot of sqrt(| residuals |)
against fitted values, a normal QQ plot, and a plot of Cook's
distances versus row labels.
[stats:lm]
LM – Robust Linear Modelling:
To fit a linear model by robust regression using an M estimator R offers the function
rlm(formula, data, weights, ..., subset, na.action,
method = c("M", "MM", "model.frame"),
wt.method = c("inv.var", "case"),
model = TRUE, x.ret = TRUE, y.ret = FALSE, contrasts = NULL)
from package MASS
. Again we can use the Rmetrics' wrapper
regFit(formula, data, use = "rlm", ...)
which allows us to use for example S4 timeSeries
objects as
input and to get the output as an S4 object with the known slots.
[MASS::rlm]
GLM – Generalized Linear Models:
Generalized linear modelling extends the linear model in two directions.
(i) with a monotonic differentiable link function describing how the
expected values are related to the linear predictor, and (ii) with
response variables having a probability distribution from an exponential
family.
R's base function from package stats
comes with the function
glm(formula, family = gaussian, data, weights, subset,
na.action, start = NULL, etastart, mustart, offset,
control = glm.control(...), model = TRUE, method = "glm.fit",
x = FALSE, y = TRUE, contrasts = NULL, ...)
Again we can use the Rmetrics' wrapper
regFit(formula, data, use = "gam", ...)
[stats::glm]
GAM – Generalized Additive Models:
An additive model generalizes a linear model by smoothing individually
each predictor term. A generalized additive model extends the additive
model in the same spirit as the generalized linear model extends the
linear model, namely for allowing a link function and for allowing
non-normal distributions from the exponential family.
[mgcv:gam]
PPR – Projection Pursuit Regression:
The basic method is given by Friedman (1984), and is essentially
the same code used by S-PLUS's ppreg
. It is observed that
this code is extremely sensitive to the compiler used. The algorithm
first adds up to max.terms
, by default ppr.nterms
,
ridge terms one at a time; it will use less if it is unable to find
a term to add that makes sufficient difference. The levels of
optimization, argument optlevel
, by default 2, differ in
how thoroughly the models are refitted during this process.
At level 0 the existing ridge terms are not refitted. At level 1
the projection directions are not refitted, but the ridge
functions and the regression coefficients are. Levels 2 and 3 refit
all the terms; level 3 is more careful to re-balance the contributions
from each regressor at each step and so is a little less likely to
converge to a saddle point of the sum of squares criterion. The
plot
method plots Ridge functions for the projection pursuit
regression fit.
[stats:ppr]
POLYMARS – Polychotomous MARS:
The algorithm employed by polymars
is different from the
MARS(tm) algorithm of Friedman (1991), though it has many similarities.
Also the name polymars
has been used for this algorithm well
before MARS was trademarked.
[polyclass:polymars]
NNET – Feedforward Neural Network Regression:
If the response in formula
is a factor, an appropriate
classification network is constructed; this has one output and
entropy fit if the number of levels is two, and a number of
outputs equal to the number of classes and a softmax output
stage for more levels. If the response is not a factor, it is
passed on unchanged to nnet.default
. A quasi-Newton
optimizer is used, written in C
.
[nnet:nnet]
Value
returns an S4 object of class "fREG"
.
Author(s)
The R core team for the lm
functions from R's base
package,
B.R. Ripley for the glm
functions from R's base
package,
S.N. Wood for the gam
functions from R's mgcv
package,
N.N. for the ppr
functions from R's modreg
package,
M. O' Connors for the polymars
functions from R's ?
package,
The R core team for the nnet
functions from R's nnet
package,
Diethelm Wuertz for the Rmetrics R-port.
References
Belsley D.A., Kuh E., Welsch R.E. (1980); Regression Diagnostics; Wiley, New York.
Dobson, A.J. (1990); An Introduction to Generalized Linear Models; Chapman and Hall, London.
Draper N.R., Smith H. (1981); Applied Regression Analysis; Wiley, New York.
Friedman, J.H. (1991); Multivariate Adaptive Regression Splines (with discussion), The Annals of Statistics 19, 1–141.
Friedman J.H., and Stuetzle W. (1981); Projection Pursuit Regression; Journal of the American Statistical Association 76, 817-823.
Friedman J.H. (1984); SMART User's Guide; Laboratory for Computational Statistics, Stanford University Technical Report No. 1.
Green, Silverman (1994); Nonparametric Regression and Generalized Linear Models; Chapman and Hall.
Gu, Wahba (1991); Minimizing GCV/GML Scores with Multiple Smoothing Parameters via the Newton Method; SIAM J. Sci. Statist. Comput. 12, 383-398.
Hastie T., Tibshirani R. (1990); Generalized Additive Models; Chapman and Hall, London.
Kooperberg Ch., Bose S., and Stone C.J. (1997); Polychotomous Regression, Journal of the American Statistical Association 92, 117–127.
McCullagh P., Nelder, J.A. (1989); Generalized Linear Models; Chapman and Hall, London.
Myers R.H. (1986); Classical and Modern Regression with Applications; Duxbury, Boston.
Rousseeuw P.J., Leroy, A. (1987); Robust Regression and Outlier Detection; Wiley, New York.
Seber G.A.F. (1977); Linear Regression Analysis; Wiley, New York.
Stone C.J., Hansen M., Kooperberg Ch., and Truong Y.K. (1997); The use of polynomial splines and their tensor products in extended linear modeling (with discussion).
Venables, W.N., Ripley, B.D. (1999); Modern Applied Statistics with S-PLUS; Springer, New York.
Wahba (1990); Spline Models of Observational Data; SIAM.
Weisberg S. (1985); Applied Linear Regression; Wiley, New York.
Wood (2000); Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties; JRSSB 62, 413-428.
Wood (2001); mgcv: GAMs and Generalized Ridge Regression for R. R News 1, 20-25.
Wood (2001); Thin Plate Regression Splines.
There exists a vast literature on regression. The references listed above are just a small sample of what is available. The book by Myers' is an introductory text book that covers discussions of much of the recent advances in regression technology. Seber's book is at a higher mathematical level and covers much of the classical theory of least squares.
Examples
## regSim -
x <- regSim(model = "LM3", n = 100)
# LM
regFit(Y ~ X1 + X2 + X3, data = x, use = "lm")
# RLM
regFit(Y ~ X1 + X2 + X3, data = x, use = "rlm")
# AM
regFit(Y ~ X1 + X2 + X3, data = x, use = "gam")
# PPR
regFit(Y ~ X1 + X2 + X3, data = x, use = "ppr")
# NNET
regFit(Y ~ X1 + X2 + X3, data = x, use = "nnet")
# POLYMARS
regFit(Y ~ X1 + X2 + X3, data = x, use = "polymars")