NGeDS {GeDS} | R Documentation |
Geometrically Designed Spline regression estimation
Description
NGeDS
constructs a Geometrically Designed variable knots spline
regression model referred to as a GeDS model, for a response having a Normal
distribution.
Usage
NGeDS(
formula,
data,
weights,
beta = 0.5,
phi = 0.99,
min.intknots = 0,
max.intknots = 500,
q = 2,
Xextr = NULL,
Yextr = NULL,
show.iters = FALSE,
stoptype = "RD",
higher_order = TRUE,
intknots = NULL,
only_predictions = FALSE
)
Arguments
formula |
a description of the structure of the model to be fitted,
including the dependent and independent variables. See
|
data |
an optional data frame, list or environment containing the
variables of the model. If not found in |
weights |
an optional vector of ‘prior weights’ to be put on the
observations in the fitting process in case the user requires weighted GeDS
fitting. It should be |
beta |
numeric parameter in the interval |
phi |
numeric parameter in the interval |
min.intknots |
optional parameter allowing the user to set a minimum number of internal knots required. By default equal to zero. |
max.intknots |
optional parameter allowing the user to set a maximum number of internal knots to be added by the GeDS estimation algorithm. By default equal to the number of knots for the saturated GeDS model. |
q |
numeric parameter which allows to fine-tune the stopping rule of stage A of GeDS, by default equal to 2. See details. |
Xextr |
numeric vector of 2 elements representing the left-most and right-most limits of the interval embedding the observations of the first independent variable. See details. |
Yextr |
numeric vector of 2 elements representing the left-most and right-most limits of the interval embedding the observations of the second independent variable (if the bivariate GeDS is run). See details. |
show.iters |
logical variable indicating whether or not to print information at each step. |
stoptype |
a character string indicating the type of GeDS stopping rule
to be used. It should be either one of |
higher_order |
a logical that defines whether to compute the higher
order fits (quadratic and cubic) after stage A is run. Default is
|
intknots |
vector of starting internal knots. Default is |
only_predictions |
logical, if |
Details
The NGeDS
function implements the GeDS methodology, recently
developed by Kaishev et al. (2016) and extended in the GGeDS
function for the more general GNM, (GLM) context, allowing for the response
to have any distribution from the Exponential Family. Under the GeDS approach
the (non-)linear predictor is viewed as a spline with variable knots which
are estimated along with the regression coefficients and the order of the
spline, using a two stage algorithm. In stage A, a linear variable-knot
spline is fitted to the data applying iteratively least squares regression
(see lm
function). In stage B, a Schoenberg variation
diminishing spline approximation to the fit from stage A is constructed, thus
simultaneously producing spline fits of order 2, 3 and 4, all of which are
included in the output, a GeDS-Class
object.
As noted in formula
, the argument formula
allows the user to specify models with two components, a spline regression
(non-parametric) component involving part of the independent variables
identified through the function f
and an optional parametric
component involving the remaining independent variables. For NGeDS
one
or two independent variables are allowed for the spline component and
arbitrary many independent variables for the parametric component. Failure to
specify the independent variable for the spline regression component through
the function f
will return an error. See
formula
.
Within the argument formula
, similarly as in other R functions, it is
possible to specify one or more offset variables, i.e. known terms with fixed
regression coefficients equal to 1. These terms should be identified via the
function offset
.
The parameter beta
tunes the placement of a new knot in stage A of the
algorithm. Once a current second-order spline is fitted to the data the
regression residuals are computed and grouped by their sign. A new knot is
placed at a location defined by the group for which a certain measure
attains its maximum. The latter measure is defined as a weighted linear
combination of the range of each group and the mean of the absolute
residuals within it. The parameter beta
determines the weights in this
measure correspondingly as beta
and 1 - beta
. The higher it
is, the more weight is put to the mean of the residuals and the less to the
range of their corresponding x-values. The default value of beta
is
0.5
.
The argument stoptype
allows to choose between three alternative
stopping rules for the knot selection in stage A of GeDS, the "RD"
,
that stands for Ratio of Deviances, the "SR"
, that stands for
Smoothed Ratio of deviances and the "LR"
, that stands for
Likelihood Ratio. The latter is based on the difference of deviances
rather than on their ratio as in the case of "RD"
and "SR"
.
Therefore "LR"
can be viewed as a log likelihood ratio test performed
at each iteration of the knot placement. In each of these cases the
corresponding stopping criterion is compared with a threshold value
phi
(see below).
The argument phi
provides a threshold value required for the stopping
rule to exit the knot placement in stage A of GeDS. The higher the value of
phi
, the more knots are added under the "RD"
and "SR"
stopping rules contrary to the case of the stopping rule "LR"
where
the lower phi
is, more knots are included in the spline regression.
Further details for each of the three alternative stopping rules can be found
in Dimitrova et al. (2023).
The argument q
is an input parameter that allows to fine-tune the
stopping rule in stage A. It identifies the number of consecutive iterations
over which the deviance should exhibit stable convergence so as the knot
placement in stage A is terminated. More precisely, under any of the rules
"RD"
, "SR"
, or "LR"
, the deviance at the current
iteration is compared to the deviance computed q
iterations before,
i.e., before selecting the last q
knots. Setting a higher q
will lead to more knots being added before exiting stage A of GeDS.
Value
GeDS-Class
object, i.e. a list of items that summarizes
the main details of the fitted GeDS regression. See GeDS-Class
for details. Some S3 methods are available in order to make these objects
tractable, such as coef
,
deviance
, knots
,
predict
and print
as
well as S4 methods for lines
and
plot
.
References
Kaishev, V.K., Dimitrova, D.S., Haberman, S. and Verrall, R.J. (2016).
Geometrically designed, variable knot regression splines.
Computational Statistics, 31, 1079–1105.
DOI: doi:10.1007/s00180-015-0621-7
Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023).
Geometrically designed variable knot splines in generalized (non-)linear
models.
Applied Mathematics and Computation, 436.
DOI: doi:10.1016/j.amc.2022.127493
See Also
GGeDS; GeDS-Class; S3 methods such as coef.GeDS, deviance.GeDS, knots.GeDS, print.GeDS and predict.GeDS; Integrate and Derive; PPolyRep.
Examples
###################################################
# Generate a data sample for the response variable
# Y and the single covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)
# Fit a Normal GeDS regression using NGeDS
(Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2)))
# Apply some of the available methods, e.g.
# coefficients, knots and deviance extractions for the
# quadratic GeDS fit
# Note that the first call to the function knots returns
# also the left and right limits of the interval containing
# the data
coef(Gmod, n = 3)
knots(Gmod, n = 3)
knots(Gmod, n = 3, options = "internal")
deviance(Gmod, n = 3)
# Add a covariate, Z, that enters linearly
Z <- runif(N)
Y2 <- Y + 2*Z + 1
# Re-fit the data using NGeDS
(Gmod2 <- NGeDS(Y2 ~ f(X) + Z, beta = 0.6, phi = 0.995, Xextr = c(-2,2)))
coef(Gmod2, n = 3)
coef(Gmod2, onlySpline = FALSE, n = 3)
## Not run:
##########################################
# Real data example
# See Kaishev et al. (2016), section 4.2
data('BaFe2As2')
(Gmod2 <- NGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.6, phi = 0.99, q = 3))
plot(Gmod2)
## End(Not run)
#########################################
# bivariate example
# See Dimitrova et al. (2023), section 5
# Generate a data sample for the response variable
# Z and the covariates X and Y assuming Normal noise
set.seed(123)
doublesin <- function(x){
sin(2*x[,1])*sin(2*x[,2])
}
X <- (round(runif(400, min = 0, max = 3),2))
Y <- (round(runif(400, min = 0, max = 3),2))
Z <- doublesin(cbind(X,Y))
Z <- Z+rnorm(400, 0, sd = 0.1)
# Fit a two dimensional GeDS model using NGeDS
(BivGeDS <- NGeDS(Z ~ f(X, Y), Xextr = c(0, 3), Yextr = c(0, 3)))
# Extract quadratic coefficients/knots/deviance
coef(BivGeDS, n = 3)
knots(BivGeDS, n = 3)
deviance(BivGeDS, n = 3)
# Surface plot of the generating function (doublesin)
plot(BivGeDS, f = doublesin)
# Surface plot of the fitted model
plot(BivGeDS)