dbplsr {dbstats} | R Documentation |
Distance-based partial least squares regression
Description
dbplsr
is a variety of partial least squares regression
where explanatory information is coded as distances between individuals.
These distances can either be computed from observed explanatory variables
or directly input as a squared distances matrix.
Since distances can be computed from a mixture of continuous and
qualitative explanatory variables or, in fact, from more general
quantities, dbplsr
is a proper extension of plsr
.
Notation convention: in distance-based methods we must distinguish observed explanatory variables which we denote by Z or z, from Euclidean coordinates which we denote by X or x. For explanation on the meaning of both terms see the bibliography references below.
Usage
## S3 method for class 'formula'
dbplsr(formula,data,...,metric="euclidean",
method="ncomp",weights,ncomp)
## S3 method for class 'dist'
dbplsr(distance,y,...,weights,ncomp=ncomp,method="ncomp")
## S3 method for class 'D2'
dbplsr(D2,y,...,weights,ncomp=ncomp,method="ncomp")
## S3 method for class 'Gram'
dbplsr(G,y,...,weights,ncomp=ncomp,method="ncomp")
Arguments
formula |
an object of class |
data |
an optional data frame containing the variables in the model (both response and explanatory variables, either the observed ones, Z, or a Euclidean configuration X). |
y |
(required if no formula is given as the principal argument). Response (dependent variable) must be numeric, matrix or data.frame. |
distance |
a |
D2 |
a |
G |
a |
metric |
metric function to be used when computing distances from observed
explanatory variables.
One of |
method |
sets the method to be used in deciding how many components needed to fit
the best model for new predictions.
There are five different methods, |
weights |
an optional numeric vector of weights to be used in the fitting process. By default all individuals have the same weight. |
ncomp |
the number of components to include in the model. |
... |
arguments passed to or from other methods to the low level. |
Details
Partial least squares (PLS) is a method for constructing
predictive models when the factors (Z) are many and highly collinear.
A PLS model will try to find the multidimensional direction
in the Z space that explains the maximum multidimensional variance direction
in the Y space. dbplsr
is particularly suited when the matrix of
predictors has more variables than observations.
By contrast, standard regression (dblm
) will fail in these cases.
The various possible ways for inputting the model explanatory
information through distances, or their squares, etc., are the
same as in dblm
.
The number of components to fit is specified with the argument ncomp
.
Value
A list of class dbplsr
containing the following components:
residuals |
a list containing the residuals (response minus fitted values) for each iteration. |
fitted.values |
a list containing the fitted values for each iteration. |
fk |
a list containing the scores for each iteration. |
bk |
regression coefficients. |
Pk |
orthogonal projector on the one-dimensional linear space by |
ncomp |
number of components included in the model. |
ncomp.opt |
optimum number of components according to the selected method. |
weights |
the specified weights. |
method |
the using method. |
y |
the response used to fit the model. |
H |
the hat matrix projector. |
G0 |
initial weighted centered inner products matrix of the squared distance matrix. |
Gk |
weighted centered inner products matrix in last iteration. |
gvar |
total weighted geometric variability. |
gvec |
the diagonal entries in |
gvar.iter |
geometric variability for each iteration. |
ocv |
the ordinary cross-validation estimate of the prediction error. |
gcv |
the generalized cross-validation estimate of the prediction error. |
aic |
the Akaike Value Criterium of the model. |
bic |
the Bayesian Value Criterium of the model. |
Note
When the Euclidean distance is used the dbplsr
model reduces to the
traditional partial least squares (plsr
).
Author(s)
Boj, Eva <evaboj@ub.edu>, Caballe, Adria <adria.caballe@upc.edu>, Delicado, Pedro <pedro.delicado@upc.edu> and Fortiana, Josep <fortiana@ub.edu>
References
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Implementing PLS for distance-based regression: computational issues. Computational Statistics 22, 237-248.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
See Also
summary.dbplsr
for summary.
plot.dbplsr
for plots.
predict.dbplsr
for predictions.
Examples
#require(pls)
library(pls)
data(yarn)
## Default methods:
yarn.dbplsr <- dbplsr(density ~ NIR, data = yarn, ncomp=6, method="GCV")