ldbglm {dbstats} | R Documentation |

## Local distance-based generalized linear model

### Description

`ldbglm`

is a localized version of a distance-based generalized linear
model. As in the global model `dbglm`

, explanatory information is
coded as distances between individuals.

Neighborhood definition for localizing is done by the (semi)metric
`dist1`

whereas a second (semi)metric `dist2`

(which may coincide
with `dist1`

) is used for distance-based prediction.
Both `dist1`

and `dist2`

can either be computed from observed
explanatory variables or directly input as a squared distances
matrix or as a `Gram`

matrix. Response and link function are as in the
`dbglm`

function for ordinary generalized linear models.
The model allows for a mixture of continuous and qualitative explanatory
variables or, in fact, from more general quantities such as functional data.

Notation convention: in distance-based methods we must distinguish
*observed explanatory variables* which we denote by Z or z, from
*Euclidean coordinates* which we denote by X or x. For explanation
on the meaning of both terms see the bibliography references below.

### Usage

```
## S3 method for class 'formula'
ldbglm(formula,data,...,family=gaussian(),kind.of.kernel=1,
metric1="euclidean",metric2=metric1,method.h="GCV",weights,
user.h=NULL,h.range=NULL,noh=10,k.knn=3,
rel.gvar=0.95,eff.rank=NULL,maxiter=100,eps1=1e-10,
eps2=1e-10)
## S3 method for class 'dist'
ldbglm(dist1,dist2=dist1,y,family=gaussian(),kind.of.kernel=1,
method.h="GCV",weights,user.h=quantile(dist1,.25),
h.range=quantile(as.matrix(dist1),c(.05,.5)),noh=10,k.knn=3,
rel.gvar=0.95,eff.rank=NULL,maxiter=100,eps1=1e-10,eps2=1e-10,...)
## S3 method for class 'D2'
ldbglm(D2.1,D2.2=D2.1,y,family=gaussian(),kind.of.kernel=1,
method.h="GCV",weights,user.h=quantile(D2.1,.25)^.5,
h.range=quantile(as.matrix(D2.1),c(.05,.5))^.5,noh=10,
k.knn=3,rel.gvar=0.95,eff.rank=NULL,maxiter=100,eps1=1e-10,
eps2=1e-10,...)
## S3 method for class 'Gram'
ldbglm(G1,G2=G1,y,kind.of.kernel=1,user.h=NULL,
family=gaussian(),method.h="GCV",weights,h.range=NULL,noh=10,
k.knn=3,rel.gvar=0.95,eff.rank=NULL,maxiter=100,eps1=1e-10,
eps2=1e-10,...)
```

### Arguments

`formula` |
an object of class |

`data` |
an optional data frame containing the variables in the model (both response and explanatory variables, either the observed ones, Z, or a Euclidean configuration X). |

`y` |
(required if no formula is given as the principal argument). Response (dependent variable) must be numeric, matrix or data.frame. |

`dist1` |
a |

`dist2` |
a |

`D2.1` |
a |

`D2.2` |
a |

`G1` |
a |

`G2` |
a |

`family` |
a description of the error distribution and link function to be used
in the model. This can be a character string naming a family
function, a family function or the result of a call to a family function.
(See |

`kind.of.kernel` |
integer number between 1 and 6 which determines the user's choice of smoothing kernel. (1) Epanechnikov (Default), (2) Biweight, (3) Triweight, (4) Normal, (5) Triangular, (6) Uniform. |

`metric1` |
metric function to be used when computing |

`metric2` |
metric function to be used when computing |

`method.h` |
sets the method to be used in deciding the |

`weights` |
an optional numeric vector of weights to be used in the fitting process. By default all individuals have the same weight. |

`user.h` |
global bandwidth |

`h.range` |
a vector of length 2 giving the range for automatic bandwidth
choice. (Default: quantiles 0.05 and 0.5 of d(i,j) in |

`noh` |
number of bandwidth |

`k.knn` |
minimum number of observations with positive weight
in neighborhood localizing. To avoid runtime errors
due to a too small bandwidth originating neighborhoods
with only one observation. By default |

`rel.gvar` |
relative geometric variability (a real number between 0 and 1).
In each |

`eff.rank` |
integer between 1 and the number of observations minus one.
Number of Euclidean coordinates used for model fitting in
each |

`maxiter` |
maximum number of iterations in the iterated |

`eps1` |
stopping criterion 1, |

`eps2` |
stopping criterion 2, |

`...` |
arguments passed to or from other methods to the low level. |

### Details

The various possible ways for inputting the model explanatory
information through distances, or their squares, etc., are the
same as in `dblm`

.

The set of bandwidth `h`

values checked in automatic
bandwidth choice is defined by `h.range`

and `noh`

,
together with `k.knn`

. For each `h`

in it a local generalized
linear model is fitted and the optimal `h`

is decided according to the
statistic specified in `method.h`

.

`kind.of.kernel`

designates which kernel function is to be used
in determining individual weights from `dist1`

values.
See `density`

for more information.

For gamma distributions, the domain of the canonical link function
is not the same as the permitted range of the mean. In particular,
the linear predictor might be negative, obtaining an impossible
negative mean. Should that event occur, `dbglm`

stops with
an error message. Proposed alternative is to use a non-canonical link
function.

### Value

A list of class `ldbglm`

containing the following components:

`residuals` |
the residuals (response minus fitted values). |

`fitted.values` |
the fitted mean values. |

`h.opt` |
the optimal bandwidth |

`family` |
the |

`y` |
the response variable used. |

`S` |
the Smoother hat projector. |

`weights` |
the specified weights. |

`call` |
the matched call. |

`dist1` |
the distance matrix (object of class |

`dist2` |
the distance matrix (object of class |

Objects of class `"ldbglm"`

are actually of class
`c("ldbglm", "ldblm")`

, inheriting the `plot.ldblm`

and
`summary.ldblm`

method from class `"ldblm"`

.

### Note

Model fitting is repeated `n`

times (`n=`

number of observations)
for each bandwidth (`noh*n`

times).
For a `noh`

too large or a sample with many observations, the time of
this function can be very high.

### Author(s)

Boj, Eva <evaboj@ub.edu>, Caballe, Adria <adria.caballe@upc.edu>, Delicado, Pedro <pedro.delicado@upc.edu> and Fortiana, Josep <fortiana@ub.edu>

### References

Boj E, Caballe, A., Delicado P, Esteve, A., Fortiana J (2016). *Global and local distance-based generalized linear models*.
TEST 25, 170-195.

Boj E, Delicado P, Fortiana J (2010). *Distance-based local linear regression for functional predictors*.
Computational Statistics and Data Analysis 54, 429-437.

Boj E, Grane A, Fortiana J, Claramunt MM (2007). *Selection of predictors in distance-based regression*.
Communications in Statistics B - Simulation and Computation 36, 87-98.

Cuadras CM, Arenas C, Fortiana J (1996). *Some computational aspects of a distance-based model
for prediction*. Communications in Statistics B - Simulation and Computation 25, 593-609.

Cuadras C, Arenas C (1990). *A distance-based regression model for prediction with mixed data*.
Communications in Statistics A - Theory and Methods 19, 2261-2279.

Cuadras CM (1989). *Distance analysis in discrimination and classification using both
continuous and categorical variables*. In: Y. Dodge (ed.), *Statistical Data Analysis and Inference*.
Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.

### See Also

`dbglm`

for distance-based generalized linear models.

`ldblm`

for local distance-based linear models.

`summary.ldbglm`

for summary.

`plot.ldbglm`

for plots.

`predict.ldbglm`

for predictions.

### Examples

```
# example of ldbglm usage
z <- rnorm(100)
y <- rbinom(100, 1, plogis(z))
D2 <- as.matrix(dist(z))^2
class(D2) <- "D2"
# Distance-based generalized linear model
dbglm2 <- dbglm(D2,y,family=binomial(link = "logit"), method="rel.gvar")
# Local Distance-based generalized linear model
ldbglm2 <- ldbglm(D2,y=y,family=binomial(link = "logit"),noh=3)
# check the difference of both
sum((y-ldbglm2$fit)^2)
sum((y-dbglm2$fit)^2)
plot(z,y)
points(z,ldbglm2$fit,col=3)
points(z,dbglm2$fit,col=2)
```

*dbstats*version 2.0.2 Index]