aglm-package {aglm} | R Documentation |
aglm: Accurate Generalized Linear Model
Description
Provides functions to fit Accurate Generalized Linear Model (AGLM) models, visualize them, and predict for new data. AGLM is defined as a regularized GLM which applies a sort of feature transformations using a discretization of numerical features and specific coding methodologies of dummy variables. For more information on AGLM, see Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa (2020).
Details
The collection of functions provided by the aglm
package has almost the same structure as the famous glmnet
package,
so users familiar with the glmnet
package will be able to handle it easily.
In fact, this structure is reasonable in implementation, because what the aglm
package does is
applying appropriate transformations to the given data and passing it to the glmnet
package as a backend.
Fitting functions
The aglm
package provides three different fitting functions, depending on how users want to handle hyper-parameters of AGLM models.
Because AGLM is based on regularized GLM, the regularization term of the loss function can be expressed as follows:
\[
R(\lbrace \beta_{jk} \rbrace; \lambda, \alpha)
= \lambda \left\lbrace
(1 - \alpha)\sum_{j=1}^{p} \sum_{k=1}^{m_j}|\beta_{jk}|^2 + \alpha \sum_{j=1}^{p} \sum_{k=1}^{m_j} |\beta_{jk}|
\right\rbrace,
\]
where \beta_jk
is the k-th coefficient of auxiliary variables for the j-th column in data,
\alpha
is a weight which controls how L1 and L2 regularization terms are mixed,
and \lambda
determines the strength of the regularization.
Searching hyper-parameters \alpha
and \lambda
is often useful to get better results, but usually time-consuming.
That's why the aglm
package provides three fitting functions with different strategies for specifying hyper-parameters as follows:
-
aglm: A basic fitting function with given
\alpha
and\lambda
(s). -
cv.aglm: A fitting function with given
\alpha
and cross-validation for\lambda
. -
cva.aglm: A fitting function with cross-validation for both
\alpha
and\lambda
.
Generally speaking, setting an appropriate \lambda
is often important to get meaningful results,
and using cv.aglm()
with default \alpha=1
(LASSO) is usually enough.
Since cva.aglm()
is much time-consuming than cv.aglm()
, it is better to use it only if particularly better results are needed.
The following S4 classes are defined to store results of the fitting functions.
-
AccurateGLM-class: A class for results of
aglm()
andcv.aglm()
-
CVA_AccurateGLM-class: A class for results of
cva.aglm()
Using the fitted model
Users can use models obtained from fitting functions in various ways, by passing them to following functions:
-
predict: Make predictions for new data
-
plot: Plot contribution of each variable and residuals
-
print: Display textual information of the model
-
coef: Get coefficients
-
deviance: Get deviance
-
residuals: Get residuals of various types
We emphasize that plot()
is particularly useful to understand the fitted model,
because it presents a visual representation of how variables in the original data are used by the model.
Other functions
The following functions are basically for internal use, but exported as utility functions for convenience.
Functions for creating feature vectors
Functions for binning
Author(s)
Kenji Kondo,
Kazuhisa Takahashi and Hikari Banno (worked on L-Variable related features)
References
Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa. (2020)
AGLM: A Hybrid Modeling Method of GLM and Data Science Techniques,
https://www.institutdesactuaires.com/global/gene/link.php?doc_id=16273&fg=1
Actuarial Colloquium Paris 2020