R: Variable selection in joint modeling of mean and dispersion

stepjglm {stepjglm}

R Documentation

Variable selection in joint modeling of mean and dispersion

Description

A Procedure for selecting variables in JMMD (including mixture models) based on hypothesis testing and the quality of the model's fit.

Usage

stepjglm(model,alpha1,alpha2,datafram,family,lambda1=1,lambda2=1,startmod=1,
                 interations=FALSE)

Arguments

`model`	an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. if `datafram` is a mixture data, `datafram` doesn't contain the principal mixture components.
`alpha1`	significance level for testing add new terms on the mean models.
`alpha2`	significance level for testing add new terms on the dispersion models.
`datafram`	a data frame containing the data.
`family`	a character string naming a family function or the result of a call to a family function. For `glm.fit` only the third option is supported. (See `family` for details of family functions). Describe the family function for the mean model (families implemented by package `stats`). For the dispersion model, the Gamma family whit log link is assumed.
`lambda1`	some function of the sample size to calculate the `\tilde{R}_m^{2}` (See Pinto and Pereira (in press) and Zhang (2017) for more details). If equal to 1 (default), uses the standard correction for the `\tilde{R}_m^{2}`. If equal to "EAIC", uses the `EAIC` criterion.
`lambda2`	some function of the sample size to calculate the `\tilde{R}_d^{2}` (See Pinto and Pereira (in press) and Zhang (2017) for more details). If equal to 1 (default), uses the standard correction for the `\tilde{R}_d^{2}`. If equal to "AIC", uses the corrected `AIC_c` criterion.
`startmod`	if `datafram` is a mixture data, `startmod` is the principal mixture components, else, `startmod` must be equal to 1 (default).
`interations`	if `TRUE` shows the outputs of iterations procedure step by step. The default is `FALSE`.

Details

The function implements a method for selection of variables for both the mean and dispersion models in the JMMD introduced by Nelder and Lee (1991) considering the Adjusted Quasi Extended Likelihood introduced by Lee and Nelder (1998). The method is a procedure for selecting variables, based on hypothesis testing and the quality of the model's fit. A criterion for checking the goodness of fit is used, in each iteration of the selection process, as a filter for choosing the terms that will be evaluated by a hypothesis test. For more details on selection algorithms, see Pinto and Pereira (in press).

Value

`model.mean`	a `glm` object with the adjustments for the mean model.

`model.disp`	a `glm` object with the adjustments for the dispersion model.

`EAIC`	a numeric object containing the Extended Akaike Information Criterion.
	For details, see Wang and Zhang (2009).

`EQD`	a numeric object containing the Extended Quasi Deviance.
	For details, see Nelder and Lee (1991).

`R2m`	a numeric object containing the standard correction for the `\tilde{R}_m^{2}`.
	For details, see Pinto and Pereira (in press).

`R2d`	a numeric object containing the standard correction for the `\tilde{R}_d^{2}`.
	For details, see Pinto and Pereira (in press).

Author(s)

Leandro Alves Pereira, Edmilson Rodrigues Pinto.

References

Hu, B. and Shao, J. (2008). Generalized linear model selection using R^2. Journal of Statistical Planning and Inference, 138, 3705-3712.

Lee, Y., Nelder, J. A. (1998). Generalized linear models for analysis of quality improvement experiments. The Canadian Journal of Statistics, v. 26, n. 1, pp. 95-105.

Nelder, J. A., Lee, Y. (1991). Generalized linear models for the analysis of Taguchi-type experiments. Applied Stochastic Models and Data Analysis, v. 7, pp. 107-120.

Pinto, E. R., Pereira, L. A. (in press). On variable selection in joint modeling of mean and dispersion. Brazilian Journal of Probability and Statistics. Preprint at https://arxiv.org/abs/2109.07978 (2021).

Wang, D. and Zhang, Z. (2009). Variable selection in joint generalized linear models. Chinese Journal of Applied Probability and Statistics, v. 25, pp.245-256.

Zhang, D. (2017). A coefficient of determination for generalized linear models. The American Statistician, v. 71, 310-316.

Examples


# Application to the bread-making problem:

data(bread_mixture)

Form =
as.formula(y~ x1:x2+x1:x3+x2:x3+x1:x2:(x1-x2)+x1:x3:(x1-x3)+
            + x1:z1+x2:z1+x3:z1+x1:x2:z1
            + x1:x3:z1+x1:x2:(x1-x2):z1
            + x1:x3:(x1-x3):z1
            + x1:z2+x2:z2+x3:z2+x1:x2:z2
            + x1:x3:z2+x1:x2:(x1-x2):z2
            +x1:x3:(x1-x3):z2)

object=stepjglm(Form,0.1,0.1,bread_mixture,gaussian,sqrt(90),"AIC","-1+x1+x2+x3")

summary(object$modelo.mean)
summary(object$modelo.disp)

object$EAIC  # Print the EAIC for the final model



# Application to the injection molding data:

form = as.formula(Y ~ A*M+A*N+A*O+B*M+B*N+B*O+C*M+C*N+C*O+D*M+D*N+D*O+
                      E*M+E*N+E*O+F*M+F*N+F*O+G*M+G*N+G*O)

data(injection_molding)

obj.dt = stepjglm(form, 0.05,0.05,injection_molding,gaussian,sqrt(nrow(injection_molding)),"AIC")

summary(obj.dt$modelo.mean)
summary(obj.dt$modelo.disp)

obj.dt$EAIC  # Print the EAIC for the final model
obj.dt$EQD   # Print the EQD for the final model
obj.dt$R2m   # Print the R2m for the final model
obj.dt$R2d   # Print the R2d for the final model

[Package stepjglm version 0.0.1 Index]