ldblm {dbstats} | R Documentation |

## Local distance-based linear model

### Description

`ldblm`

is a localized version of a distance-based linear model.
As in the global model `dblm`

, explanatory information is coded as
distances between individuals.

Neighborhood definition for localizing is done by the (semi)metric
`dist1`

whereas a second (semi)metric `dist2`

(which may coincide
with `dist1`

) is used for distance-based prediction.
Both `dist1`

and `dist2`

can either be computed from observed
explanatory variables or directly input as a squared distances
matrix or as a `Gram`

matrix. The response is a continuous variable
as in the ordinary linear model. The model allows for a mixture of
continuous and qualitative explanatory variables or, in fact, from more
general quantities such as functional data.

Notation convention: in distance-based methods we must distinguish
*observed explanatory variables* which we denote by Z or z, from
*Euclidean coordinates* which we denote by X or x. For explanation
on the meaning of both terms see the bibliography references below.

### Usage

```
## S3 method for class 'formula'
ldblm(formula,data,...,kind.of.kernel=1,
metric1="euclidean",metric2=metric1,method.h="GCV",weights,
user.h=NULL,h.range=NULL,noh=10,k.knn=3,rel.gvar=0.95,eff.rank=NULL)
## S3 method for class 'dist'
ldblm(dist1,dist2=dist1,y,kind.of.kernel=1,
method.h="GCV",weights,user.h=quantile(dist1,.25),
h.range=quantile(as.matrix(dist1),c(.05,.5)),noh=10,
k.knn=3,rel.gvar=0.95,eff.rank=NULL,...)
## S3 method for class 'D2'
ldblm(D2.1,D2.2=D2.1,y,kind.of.kernel=1,method.h="GCV",
weights,user.h=quantile(D2.1,.25)^.5,
h.range=quantile(as.matrix(D2.1),c(.05,.5))^.5,noh=10,k.knn=3,
rel.gvar=0.95,eff.rank=NULL,...)
## S3 method for class 'Gram'
ldblm(G1,G2=G1,y,kind.of.kernel=1,method.h="GCV",
weights,user.h=NULL,h.range=NULL,noh=10,k.knn=3,rel.gvar=0.95,
eff.rank=NULL,...)
```

### Arguments

`formula` |
an object of class |

`data` |
an optional data frame containing the variables in the model (both response and explanatory variables, either the observed ones, Z, or a Euclidean configuration X). |

`y` |
(required if no formula is given as the principal argument). Response (dependent variable) must be numeric, matrix or data.frame. |

`dist1` |
a |

`dist2` |
a |

`D2.1` |
a |

`D2.2` |
a |

`G1` |
a |

`G2` |
a |

`kind.of.kernel` |
integer number between 1 and 6 which determines the user's choice of smoothing kernel. (1) Epanechnikov (Default), (2) Biweight, (3) Triweight, (4) Normal, (5) Triangular, (6) Uniform. |

`metric1` |
metric function to be used when computing |

`metric2` |
metric function to be used when computing |

`method.h` |
sets the method to be used in deciding the |

`weights` |
an optional numeric vector of weights to be used in the fitting process. By default all individuals have the same weight. |

`user.h` |
global bandwidth |

`h.range` |
a vector of length 2 giving the range for automatic bandwidth
choice. (Default: quantiles 0.05 and 0.5 of d(i,j) in |

`noh` |
number of bandwidth |

`k.knn` |
minimum number of observations with positive weight
in neighborhood localizing. To avoid runtime errors
due to a too small bandwidth originating neighborhoods
with only one observation. By default |

`rel.gvar` |
relative geometric variability (a real number between 0 and 1).
In each |

`eff.rank` |
integer between 1 and the number of observations minus one.
Number of Euclidean coordinates used for model fitting in
each |

`...` |
arguments passed to or from other methods to the low level. |

### Details

There are two semi-metrics involved in local linear distance-based estimation:
`dist1`

and `dist2`

. Both semi-metrics can coincide.
For instance, when `dist1=||xi-xj||`

and
`dist2=||(xi,xi^2,xi^3)-(xj,xj^2,xj^3)||`

the estimator
for new observations coincides with fitting a local cubic polynomial
regression.

The set of bandwidth `h`

values checked in automatic
bandwidth choice is defined by `h.range`

and `noh`

,
together with `k.knn`

. For each `h`

in it a local linear
model is fitted and the optimal `h`

is decided according to the
statistic specified in `method.h`

.

`kind.of.kernel`

designates which kernel function is to be used
in determining individual weights from `dist1`

values.
See `density`

for more information.

### Value

A list of class `ldblm`

containing the following components:

`residuals` |
the residuals (response minus fitted values). |

`fitted.values` |
the fitted mean values. |

`h.opt` |
the optimal bandwidth h used in the fitting proces
( |

`S` |
the Smoother hat projector. |

`weights` |
the specified weights. |

`y` |
the response variable used. |

`call` |
the matched call. |

`dist1` |
the distance matrix (object of class |

`dist2` |
the distance matrix (object of class |

### Note

Model fitting is repeated `n`

times (`n=`

number of observations)
for each bandwidth (`noh*n`

times).
For a `noh`

too large or a sample with many observations, the time of
this function can be very high.

### Author(s)

Boj, Eva <evaboj@ub.edu>, Caballe, Adria <adria.caballe@upc.edu>, Delicado, Pedro <pedro.delicado@upc.edu> and Fortiana, Josep <fortiana@ub.edu>

### References

Boj E, Caballe, A., Delicado P, Esteve, A., Fortiana J (2016). *Global and local distance-based generalized linear models*.
TEST 25, 170-195.

Boj E, Delicado P, Fortiana J (2010). *Distance-based local linear regression for functional predictors*.
Computational Statistics and Data Analysis 54, 429-437.

Boj E, Grane A, Fortiana J, Claramunt MM (2007). *Selection of predictors in distance-based regression*.
Communications in Statistics B - Simulation and Computation 36, 87-98.

Cuadras CM, Arenas C, Fortiana J (1996). *Some computational aspects of a distance-based model
for prediction*. Communications in Statistics B - Simulation and Computation 25, 593-609.

Cuadras C, Arenas C (1990). *A distance-based regression model for prediction with mixed data*.
Communications in Statistics A - Theory and Methods 19, 2261-2279.

Cuadras CM (1989). *Distance analysis in discrimination and classification using both
continuous and categorical variables*. In: Y. Dodge (ed.), *Statistical Data Analysis and Inference*.
Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.

### See Also

`dblm`

for distance-based linear models.

`ldbglm`

for local distance-based generalized linear models.

`summary.ldblm`

for summary.

`plot.ldblm`

for plots.

`predict.ldblm`

for predictions.

### Examples

```
# example to use of the ldblm function
n <- 100
p <- 1
k <- 5
Z <- matrix(rnorm(n*p),nrow=n)
b1 <- matrix(runif(p)*k,nrow=p)
b2 <- matrix(runif(p)*k,nrow=p)
b3 <- matrix(runif(p)*k,nrow=p)
s <- 1
e <- rnorm(n)*s
y <- Z%*%b1 + Z^2%*%b2 +Z^3%*%b3 + e
D2 <- as.matrix(dist(Z)^2)
class(D2) <- "D2"
ldblm1 <- ldblm(y~Z,kind.of.kernel=1,method="GCV",noh=3,k.knn=3)
ldblm2 <- ldblm(D2.1=D2,D2.2=D2,y,kind.of.kernel=1,method="user.h",k.knn=3)
```

*dbstats*version 2.0.2 Index]