dbplsr {dbstats} | R Documentation |

## Distance-based partial least squares regression

### Description

`dbplsr`

is a variety of partial least squares regression
where explanatory information is coded as distances between individuals.
These distances can either be computed from observed explanatory variables
or directly input as a squared distances matrix.

Since distances can be computed from a mixture of continuous and
qualitative explanatory variables or, in fact, from more general
quantities, `dbplsr`

is a proper extension of `plsr`

.

Notation convention: in distance-based methods we must distinguish
*observed explanatory variables* which we denote by Z or z, from
*Euclidean coordinates* which we denote by X or x. For explanation
on the meaning of both terms see the bibliography references below.

### Usage

```
## S3 method for class 'formula'
dbplsr(formula,data,...,metric="euclidean",
method="ncomp",weights,ncomp)
## S3 method for class 'dist'
dbplsr(distance,y,...,weights,ncomp=ncomp,method="ncomp")
## S3 method for class 'D2'
dbplsr(D2,y,...,weights,ncomp=ncomp,method="ncomp")
## S3 method for class 'Gram'
dbplsr(G,y,...,weights,ncomp=ncomp,method="ncomp")
```

### Arguments

`formula` |
an object of class |

`data` |
an optional data frame containing the variables in the model (both response and explanatory variables, either the observed ones, Z, or a Euclidean configuration X). |

`y` |
(required if no formula is given as the principal argument). Response (dependent variable) must be numeric, matrix or data.frame. |

`distance` |
a |

`D2` |
a |

`G` |
a |

`metric` |
metric function to be used when computing distances from observed
explanatory variables.
One of |

`method` |
sets the method to be used in deciding how many components needed to fit
the best model for new predictions.
There are five different methods, |

`weights` |
an optional numeric vector of weights to be used in the fitting process. By default all individuals have the same weight. |

`ncomp` |
the number of components to include in the model. |

`...` |
arguments passed to or from other methods to the low level. |

### Details

Partial least squares (PLS) is a method for constructing
predictive models when the factors (Z) are many and highly collinear.
A PLS model will try to find the multidimensional direction
in the Z space that explains the maximum multidimensional variance direction
in the Y space. `dbplsr`

is particularly suited when the matrix of
predictors has more variables than observations.
By contrast, standard regression (`dblm`

) will fail in these cases.

The various possible ways for inputting the model explanatory
information through distances, or their squares, etc., are the
same as in `dblm`

.

The number of components to fit is specified with the argument `ncomp`

.

### Value

A list of class `dbplsr`

containing the following components:

`residuals` |
a list containing the residuals (response minus fitted values) for each iteration. |

`fitted.values` |
a list containing the fitted values for each iteration. |

`fk` |
a list containing the scores for each iteration. |

`bk` |
regression coefficients. |

`Pk` |
orthogonal projector on the one-dimensional linear space by |

`ncomp` |
number of components included in the model. |

`ncomp.opt` |
optimum number of components according to the selected method. |

`weights` |
the specified weights. |

`method` |
the using method. |

`y` |
the response used to fit the model. |

`H` |
the hat matrix projector. |

`G0` |
initial weighted centered inner products matrix of the squared distance matrix. |

`Gk` |
weighted centered inner products matrix in last iteration. |

`gvar` |
total weighted geometric variability. |

`gvec` |
the diagonal entries in |

`gvar.iter` |
geometric variability for each iteration. |

`ocv` |
the ordinary cross-validation estimate of the prediction error. |

`gcv` |
the generalized cross-validation estimate of the prediction error. |

`aic` |
the Akaike Value Criterium of the model. |

`bic` |
the Bayesian Value Criterium of the model. |

### Note

When the Euclidean distance is used the `dbplsr`

model reduces to the
traditional partial least squares (`plsr`

).

### Author(s)

Boj, Eva <evaboj@ub.edu>, Caballe, Adria <adria.caballe@upc.edu>, Delicado, Pedro <pedro.delicado@upc.edu> and Fortiana, Josep <fortiana@ub.edu>

### References

Boj E, Delicado P, Fortiana J (2010). *Distance-based local linear regression for functional predictors*.
Computational Statistics and Data Analysis 54, 429-437.

Boj E, Grane A, Fortiana J, Claramunt MM (2007). *Implementing PLS for distance-based regression:
computational issues*.
*Computational Statistics* 22, 237-248.

Boj E, Grane A, Fortiana J, Claramunt MM (2007). *Selection of predictors in distance-based regression*.
Communications in Statistics B - Simulation and Computation 36, 87-98.

Cuadras CM, Arenas C, Fortiana J (1996). *Some computational aspects of a distance-based model
for prediction*. Communications in Statistics B - Simulation and Computation 25, 593-609.

Cuadras C, Arenas C (1990). *A distance-based regression model for prediction with mixed data*.
Communications in Statistics A - Theory and Methods 19, 2261-2279.

Cuadras CM (1989). *Distance analysis in discrimination and classification using both
continuous and categorical variables*. In: Y. Dodge (ed.), *Statistical Data Analysis and Inference*.
Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.

### See Also

`summary.dbplsr`

for summary.

`plot.dbplsr`

for plots.

`predict.dbplsr`

for predictions.

### Examples

```
#require(pls)
library(pls)
data(yarn)
## Default methods:
yarn.dbplsr <- dbplsr(density ~ NIR, data = yarn, ncomp=6, method="GCV")
```

*dbstats*version 2.0.2 Index]