alvm.fit {skedastic} | R Documentation |
Auxiliary Linear Variance Model
Description
Fits an Auxiliary Linear Variance Model (ALVM) to estimate the error variances of a heteroskedastic linear regression model.
Usage
alvm.fit(
mainlm,
M = NULL,
model = c("cluster", "spline", "linear", "polynomial", "basic", "homoskedastic"),
varselect = c("none", "hettest", "cv.linear", "cv.cluster", "qgcv.linear",
"qgcv.cluster"),
lambda = c("foldcv", "qgcv"),
nclust = c("elbow.swd", "elbow.mwd", "elbow.both", "foldcv"),
clustering = NULL,
polypen = c("L2", "L1"),
d = 2L,
solver = c("auto", "quadprog", "quadprogXT", "roi", "osqp"),
tsk = NULL,
tsm = NULL,
constol = 1e-10,
cvoption = c("testsetols", "partitionres"),
nfolds = 5L,
reduce2homosked = TRUE,
...
)
Arguments
mainlm |
Either an object of |
M |
An |
model |
A character corresponding to the type of ALVM to be fitted:
|
varselect |
Either a character indicating how variable selection should
be conducted, or an integer vector giving indices of columns of the
predictor matrix (
|
lambda |
Either a double of length 1 indicating the value of the
penalty hyperparameter |
nclust |
Either an integer of length 1 indicating the value of the
number of clusters |
clustering |
A list object of class |
polypen |
A character, either |
d |
An integer specifying the degree of polynomial to use in the
penalised polynomial ALVM; defaults to |
solver |
A character, indicating which Quadratic Programming solver
function to use to estimate |
tsk |
An integer corresponding to the basis dimension |
tsm |
An integer corresponding to the order |
constol |
A double corresponding to the boundary value for the
constraint on error variances. Of course, the error variances must be
non-negative, but setting the constraint boundary to 0 can result in
zero estimates that then result in infinite weights for Feasible
Weighted Least Squares. The boundary value should thus be positive, but
small enough not to bias estimation of very small variances. Defaults to
|
cvoption |
A character, either |
nfolds |
An integer specifying the number of folds |
reduce2homosked |
A logical indicating whether the homoskedastic
error variance estimator |
... |
Other arguments that can be passed to (non-exported) helper functions, namely:
|
Details
The ALVM model equation is
e\circ e = (M \circ M)L \gamma + u
,
where e
is the Ordinary Least Squares residual vector, M
is
the annihilator matrix M=I-X(X'X)^{-1}X'
, L
is a linear
predictor matrix, u
is a random error vector, \gamma
is a
p
-vector of unknown parameters, and \circ
denotes the
Hadamard (elementwise) product. The construction of L
depends on
the method used to model or estimate the assumed heteroskedastic
function g(\cdot)
, a continuous, differentiable function that is
linear in \gamma
and by which the error variances \omega_i
of the main linear model are related to the predictors X_{i\cdot}
.
This method has been developed as part of the author's doctoral research
project.
Depending on the model used, the estimation method could be Inequality-Constrained Least Squares or Inequality-Constrained Ridge Regression. However, these are both special cases of Quadratic Programming. Therefore, all of the models are fitted using Quadratic Programming.
Several techniques are available for feature selection within the model.
The LASSO-type model handles feature selection via a shrinkage penalty.
For this reason, if the user calls the polynomial model with
L_1
-norm penalty, it is not necessary to specify a variable
selection method, since this is handled automatically. Another feature
selection technique is to use a heteroskedasticity test that tests for
heteroskedasticity linked to a particular predictor variable (the
‘deflator’). This test can be conducted with each features in turn
serving as the deflator. Those features for which the null hypothesis of
homoskedasticity is rejected at a specified significance level
alpha
are selected. A third feature selection technique is best
subset selection, where the model is fitted with all possible subsets of
features. The models are scored in terms of some metric, and the
best-performing subset of features is selected. The metric could be
squared-error loss computed under K
-fold cross-validation or using
quasi-generalised cross-validation. (The quasi- prefix refers to
the fact that generalised cross-validation is, properly speaking, only
applicable to a linear fitting method, as defined by
Hastie et al. (2009). ALVMs are not linear fitting
methods due to the inequality constraint). Since best subset selection
requires fitting 2^{p-1}
models (where p-1
is the number of
candidate features), it is infeasible for large p
. A greedy search
technique can therefore be used as an alternative, where one begins with
a null model and adds the feature that leads to the best improvement in
the metric, stopping when no new feature leads to an improvement.
The polynomial and thin-plate spline ALVMs have a penalty hyperparameter
\lambda
that must either be specified or tuned. K
-fold
cross-validation or quasi-generalised cross-validation can be used for
tuning. The clustering ALVM has a hyperparameter n_c
, the number of
clusters into which to group the observations (where error variances
are assumed to be equal within each cluster). n_c
can be specified
or tuned. The available tuning methods are an elbow method (using a
sum of within-cluster distances criterion, a maximum
within-cluster distance criterion, or a combination of the two) and
K
-fold cross-validation.
Value
An object of class "alvm.fit"
, containing the following:
-
coef.est
, a vector of parameter estimates,\hat{\gamma}
-
var.est
, a vector of estimates\hat{\omega}
of the error variances for all observations -
method
, a character corresponding to themodel
argument -
ols
, thelm
object corresponding to the original linear regression model -
fitinfo
, a list containing four named objects:Msq
(the elementwise-square of the annihilator matrixM
),L
(the linear predictor matrixL
),clustering
(a list object with results of the clustering procedure), andgam.object
, an object of class"gam"
(seegamObject
). The last two are set toNA
unless the clustering ALVM or thin-plate spline ALVM is used, respectively -
hyperpar
, a named list of hyperparameter values,lambda
,nclust
,tsk
, andd
, and tuning methods,lambdamethod
andnclustmethod
. Values corresponding to unused hyperparameters are set toNA
. -
selectinfo
, a list containing two named objects,varselect
(the value of the eponymous argument), andselectedcols
(a numeric vector with column indices ofX
that were selected, with1
denoting the intercept column) -
pentype
, a character corresponding to thepolypen
argument -
solver
, a character corresponding to thesolver
argument (or specifying the QP solver actually used, ifsolver
was set to"auto"
) -
constol
, a double corresponding to theconstol
argument
References
Hastie T, Tibshirani R, Friedman JH (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition. Springer, New York.
See Also
Examples
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars)
myalvm <- alvm.fit(mtcars_lm, model = "cluster")