ldblm {dbstats} | R Documentation |
Local distance-based linear model
Description
ldblm
is a localized version of a distance-based linear model.
As in the global model dblm
, explanatory information is coded as
distances between individuals.
Neighborhood definition for localizing is done by the (semi)metric
dist1
whereas a second (semi)metric dist2
(which may coincide
with dist1
) is used for distance-based prediction.
Both dist1
and dist2
can either be computed from observed
explanatory variables or directly input as a squared distances
matrix or as a Gram
matrix. The response is a continuous variable
as in the ordinary linear model. The model allows for a mixture of
continuous and qualitative explanatory variables or, in fact, from more
general quantities such as functional data.
Notation convention: in distance-based methods we must distinguish observed explanatory variables which we denote by Z or z, from Euclidean coordinates which we denote by X or x. For explanation on the meaning of both terms see the bibliography references below.
Usage
## S3 method for class 'formula'
ldblm(formula,data,...,kind.of.kernel=1,
metric1="euclidean",metric2=metric1,method.h="GCV",weights,
user.h=NULL,h.range=NULL,noh=10,k.knn=3,rel.gvar=0.95,eff.rank=NULL)
## S3 method for class 'dist'
ldblm(dist1,dist2=dist1,y,kind.of.kernel=1,
method.h="GCV",weights,user.h=quantile(dist1,.25),
h.range=quantile(as.matrix(dist1),c(.05,.5)),noh=10,
k.knn=3,rel.gvar=0.95,eff.rank=NULL,...)
## S3 method for class 'D2'
ldblm(D2.1,D2.2=D2.1,y,kind.of.kernel=1,method.h="GCV",
weights,user.h=quantile(D2.1,.25)^.5,
h.range=quantile(as.matrix(D2.1),c(.05,.5))^.5,noh=10,k.knn=3,
rel.gvar=0.95,eff.rank=NULL,...)
## S3 method for class 'Gram'
ldblm(G1,G2=G1,y,kind.of.kernel=1,method.h="GCV",
weights,user.h=NULL,h.range=NULL,noh=10,k.knn=3,rel.gvar=0.95,
eff.rank=NULL,...)
Arguments
formula |
an object of class |
data |
an optional data frame containing the variables in the model (both response and explanatory variables, either the observed ones, Z, or a Euclidean configuration X). |
y |
(required if no formula is given as the principal argument). Response (dependent variable) must be numeric, matrix or data.frame. |
dist1 |
a |
dist2 |
a |
D2.1 |
a |
D2.2 |
a |
G1 |
a |
G2 |
a |
kind.of.kernel |
integer number between 1 and 6 which determines the user's choice of smoothing kernel. (1) Epanechnikov (Default), (2) Biweight, (3) Triweight, (4) Normal, (5) Triangular, (6) Uniform. |
metric1 |
metric function to be used when computing |
metric2 |
metric function to be used when computing |
method.h |
sets the method to be used in deciding the optimal bandwidth h.
There are five different methods, |
weights |
an optional numeric vector of weights to be used in the fitting process. By default all individuals have the same weight. |
user.h |
global bandwidth |
h.range |
a vector of length 2 giving the range for automatic bandwidth
choice. (Default: quantiles 0.05 and 0.5 of d(i,j) in |
noh |
number of bandwidth |
k.knn |
minimum number of observations with positive weight
in neighborhood localizing. To avoid runtime errors
due to a too small bandwidth originating neighborhoods
with only one observation. By default |
rel.gvar |
relative geometric variability (a real number between 0 and 1).
In each |
eff.rank |
integer between 1 and the number of observations minus one.
Number of Euclidean coordinates used for model fitting in
each |
... |
arguments passed to or from other methods to the low level. |
Details
There are two semi-metrics involved in local linear distance-based estimation:
dist1
and dist2
. Both semi-metrics can coincide.
For instance, when dist1=||xi-xj||
and
dist2=||(xi,xi^2,xi^3)-(xj,xj^2,xj^3)||
the estimator
for new observations coincides with fitting a local cubic polynomial
regression.
The set of bandwidth h
values checked in automatic
bandwidth choice is defined by h.range
and noh
,
together with k.knn
. For each h
in it a local linear
model is fitted and the optimal h
is decided according to the
statistic specified in method.h
.
kind.of.kernel
designates which kernel function is to be used
in determining individual weights from dist1
values.
See density
for more information.
Value
A list of class ldblm
containing the following components:
residuals |
the residuals (response minus fitted values). |
fitted.values |
the fitted mean values. |
h.opt |
the optimal bandwidth h used in the fitting proces
( |
S |
the Smoother hat projector. |
weights |
the specified weights. |
y |
the response variable used. |
call |
the matched call. |
dist1 |
the distance matrix (object of class |
dist2 |
the distance matrix (object of class |
Note
Model fitting is repeated n
times (n=
number of observations)
for each bandwidth (noh*n
times).
For a noh
too large or a sample with many observations, the time of
this function can be very high.
Author(s)
Boj, Eva <evaboj@ub.edu>, Caballe, Adria <adria.caballe@upc.edu>, Delicado, Pedro <pedro.delicado@upc.edu> and Fortiana, Josep <fortiana@ub.edu>
References
Boj E, Caballe, A., Delicado P, Esteve, A., Fortiana J (2016). Global and local distance-based generalized linear models. TEST 25, 170-195.
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
See Also
dblm
for distance-based linear models.
ldbglm
for local distance-based generalized linear models.
summary.ldblm
for summary.
plot.ldblm
for plots.
predict.ldblm
for predictions.
Examples
# example to use of the ldblm function
n <- 100
p <- 1
k <- 5
Z <- matrix(rnorm(n*p),nrow=n)
b1 <- matrix(runif(p)*k,nrow=p)
b2 <- matrix(runif(p)*k,nrow=p)
b3 <- matrix(runif(p)*k,nrow=p)
s <- 1
e <- rnorm(n)*s
y <- Z%*%b1 + Z^2%*%b2 +Z^3%*%b3 + e
D2 <- as.matrix(dist(Z)^2)
class(D2) <- "D2"
ldblm1 <- ldblm(y~Z,kind.of.kernel=1,method="GCV",noh=3,k.knn=3)
ldblm2 <- ldblm(D2.1=D2,D2.2=D2,y,kind.of.kernel=1,method="user.h",k.knn=3)