aglm-package {aglm} R Documentation

## aglm: Accurate Generalized Linear Model

### Description

Provides functions to fit Accurate Generalized Linear Model (AGLM) models, visualize them, and predict for new data. AGLM is defined as a regularized GLM which applies a sort of feature transformations using a discretization of numerical features and specific coding methodologies of dummy variables. For more information on AGLM, see Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa (2020).

### Details

The collection of functions provided by the aglm package has almost the same structure as the famous glmnet package, so users familiar with the glmnet package will be able to handle it easily. In fact, this structure is reasonable in implementation, because what the aglm package does is applying appropriate transformations to the given data and passing it to the glmnet package as a backend.

### Fitting functions

The aglm package provides three different fitting functions, depending on how users want to handle hyper-parameters of AGLM models.

Because AGLM is based on regularized GLM, the regularization term of the loss function can be expressed as follows: $R(\lbrace \beta_{jk} \rbrace; \lambda, \alpha) = \lambda \left\lbrace (1 - \alpha)\sum_{j=1}^{p} \sum_{k=1}^{m_j}|\beta_{jk}|^2 + \alpha \sum_{j=1}^{p} \sum_{k=1}^{m_j} |\beta_{jk}| \right\rbrace,$ where β_jk is the k-th coefficient of auxiliary variables for the j-th column in data, α is a weight which controls how L1 and L2 regularization terms are mixed, and λ determines the strength of the regularization.

Searching hyper-parameters α and λ is often useful to get better results, but usually time-consuming. That's why the aglm package provides three fitting functions with different strategies for specifying hyper-parameters as follows:

• aglm: A basic fitting function with given α and λ (s).

• cv.aglm: A fitting function with given α and cross-validation for λ.

• cva.aglm: A fitting function with cross-validation for both α and λ.

Generally speaking, setting an appropriate λ is often important to get meaningful results, and using cv.aglm() with default α=1 (LASSO) is usually enough. Since cva.aglm() is much time-consuming than cv.aglm(), it is better to use it only if particularly better results are needed.

The following S4 classes are defined to store results of the fitting functions.

### Using the fitted model

Users can use models obtained from fitting functions in various ways, by passing them to following functions:

• predict: Make predictions for new data

• plot: Plot contribution of each variable and residuals

• print: Display textual information of the model

• coef: Get coefficients

• deviance: Get deviance

• residuals: Get residuals of various types

We emphasize that plot() is particularly useful to understand the fitted model, because it presents a visual representation of how variables in the original data are used by the model.

### Other functions

The following functions are basically for internal use, but exported as utility functions for convenience.

• Functions for creating feature vectors

• Functions for binning

### Author(s)

• Kenji Kondo,

• Kazuhisa Takahashi and Hikari Banno (worked on L-Variable related features)

### References

Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa. (2020) AGLM: A Hybrid Modeling Method of GLM and Data Science Techniques,