stepjglm {stepjglm}R Documentation

Variable selection in joint modeling of mean and dispersion

Description

A Procedure for selecting variables in JMMD (including mixture models) based on hypothesis testing and the quality of the model's fit.

Usage

stepjglm(model,alpha1,alpha2,datafram,family,lambda1=1,lambda2=1,startmod=1,
                 interations=FALSE)

Arguments

model

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. if datafram is a mixture data, datafram doesn't contain the principal mixture components.

alpha1

significance level for testing add new terms on the mean models.

alpha2

significance level for testing add new terms on the dispersion models.

datafram

a data frame containing the data.

family

a character string naming a family function or the result of a call to a family function. For glm.fit only the third option is supported. (See family for details of family functions). Describe the family function for the mean model (families implemented by package stats). For the dispersion model, the Gamma family whit log link is assumed.

lambda1

some function of the sample size to calculate the \tilde{R}_m^{2} (See Pinto and Pereira (in press) and Zhang (2017) for more details). If equal to 1 (default), uses the standard correction for the \tilde{R}_m^{2}. If equal to "EAIC", uses the EAIC criterion.

lambda2

some function of the sample size to calculate the \tilde{R}_d^{2} (See Pinto and Pereira (in press) and Zhang (2017) for more details). If equal to 1 (default), uses the standard correction for the \tilde{R}_d^{2}. If equal to "AIC", uses the corrected AIC_c criterion.

startmod

if datafram is a mixture data, startmod is the principal mixture components, else, startmod must be equal to 1 (default).

interations

if TRUE shows the outputs of iterations procedure step by step. The default is FALSE.

Details

The function implements a method for selection of variables for both the mean and dispersion models in the JMMD introduced by Nelder and Lee (1991) considering the Adjusted Quasi Extended Likelihood introduced by Lee and Nelder (1998). The method is a procedure for selecting variables, based on hypothesis testing and the quality of the model's fit. A criterion for checking the goodness of fit is used, in each iteration of the selection process, as a filter for choosing the terms that will be evaluated by a hypothesis test. For more details on selection algorithms, see Pinto and Pereira (in press).

Value

model.mean a glm object with the adjustments for the mean model.
model.disp a glm object with the adjustments for the dispersion model.
EAIC a numeric object containing the Extended Akaike Information Criterion.
For details, see Wang and Zhang (2009).
EQD a numeric object containing the Extended Quasi Deviance.
For details, see Nelder and Lee (1991).
R2m a numeric object containing the standard correction for the \tilde{R}_m^{2}.
For details, see Pinto and Pereira (in press).
R2d a numeric object containing the standard correction for the \tilde{R}_d^{2}.
For details, see Pinto and Pereira (in press).

Author(s)

Leandro Alves Pereira, Edmilson Rodrigues Pinto.

References

Hu, B. and Shao, J. (2008). Generalized linear model selection using R^2. Journal of Statistical Planning and Inference, 138, 3705-3712.

Lee, Y., Nelder, J. A. (1998). Generalized linear models for analysis of quality improvement experiments. The Canadian Journal of Statistics, v. 26, n. 1, pp. 95-105.

Nelder, J. A., Lee, Y. (1991). Generalized linear models for the analysis of Taguchi-type experiments. Applied Stochastic Models and Data Analysis, v. 7, pp. 107-120.

Pinto, E. R., Pereira, L. A. (in press). On variable selection in joint modeling of mean and dispersion. Brazilian Journal of Probability and Statistics. Preprint at https://arxiv.org/abs/2109.07978 (2021).

Wang, D. and Zhang, Z. (2009). Variable selection in joint generalized linear models. Chinese Journal of Applied Probability and Statistics, v. 25, pp.245-256.

Zhang, D. (2017). A coefficient of determination for generalized linear models. The American Statistician, v. 71, 310-316.

See Also

glm

summary.glm

Examples


# Application to the bread-making problem:

data(bread_mixture)

Form =
as.formula(y~ x1:x2+x1:x3+x2:x3+x1:x2:(x1-x2)+x1:x3:(x1-x3)+
            + x1:z1+x2:z1+x3:z1+x1:x2:z1
            + x1:x3:z1+x1:x2:(x1-x2):z1
            + x1:x3:(x1-x3):z1
            + x1:z2+x2:z2+x3:z2+x1:x2:z2
            + x1:x3:z2+x1:x2:(x1-x2):z2
            +x1:x3:(x1-x3):z2)

object=stepjglm(Form,0.1,0.1,bread_mixture,gaussian,sqrt(90),"AIC","-1+x1+x2+x3")

summary(object$modelo.mean)
summary(object$modelo.disp)

object$EAIC  # Print the EAIC for the final model



# Application to the injection molding data:

form = as.formula(Y ~ A*M+A*N+A*O+B*M+B*N+B*O+C*M+C*N+C*O+D*M+D*N+D*O+
                      E*M+E*N+E*O+F*M+F*N+F*O+G*M+G*N+G*O)

data(injection_molding)

obj.dt = stepjglm(form, 0.05,0.05,injection_molding,gaussian,sqrt(nrow(injection_molding)),"AIC")

summary(obj.dt$modelo.mean)
summary(obj.dt$modelo.disp)

obj.dt$EAIC  # Print the EAIC for the final model
obj.dt$EQD   # Print the EQD for the final model
obj.dt$R2m   # Print the R2m for the final model
obj.dt$R2d   # Print the R2d for the final model


[Package stepjglm version 0.0.1 Index]