R: Robust Beta Regression

robustbetareg {robustbetareg}

R Documentation

Robust Beta Regression

Description

Fit robust beta regression models for rates and proportions via LSMLE, LMDPDE, SMLE and MDPDE. Both mean and precision of the response variable are modeled through parametric functions.

Usage

robustbetareg(
  formula,
  data,
  alpha,
  type = c("LSMLE", "LMDPDE", "SMLE", "MDPDE"),
  link = c("logit", "probit", "cloglog", "cauchit", "loglog"),
  link.phi = NULL,
  control = robustbetareg.control(...),
  model = TRUE,
  ...
)

LMDPDE.fit(y, x, z, alpha = NULL, link = "logit",
link.phi = "log", control = robustbetareg.control(...), ...)

LSMLE.fit(y, x, z, alpha = NULL, link = "logit",
link.phi = "log", control = robustbetareg.control(...), ...)

MDPDE.fit(y, x, z, alpha = NULL, link = "logit",
link.phi = "log", control = robustbetareg.control(...), ...)

SMLE.fit(y, x, z, alpha = NULL, link = "logit",
link.phi = "log", control = robustbetareg.control(...), ...)

Arguments

`formula`	symbolic description of the model. See Details for further information.
`data`	dataset to be used.
`alpha`	numeric in `[0,1)` indicating the value of the tuning constant alpha. `alpha = 0` leads to the maximum likelihood estimator. Robust procedures require `alpha` greater than zero. If this argument is suppressed, the tuning constant will be selected automatically through the data-driven algorithm proposed by Ribeiro and Ferrari (2022).
`type`	character specifying the type of robust estimator to be used in the estimation process. Supported estimators are "`LSMLE`" , "`LMDPDE`", "`SMLE`", and "`MDPDE`"; for details, see Maluf et al. (2022). The "`LSMLE`" is the default.
`link`	an optional character that specifies the link function of the mean submodel (mu). The "`logit`", "`probit`", "`cloglog`", "`cauchit`", "`loglog`" functions are supported. The `logit` function is the default.
`link.phi`	an optional character that specifies the link function of the precision submodel (phi). The "`identity`", "`log`", "`sqrt`" functions are supported. The default is `log` unless formula is of type `y ~ x` where the default is "`identity`".
`control`	a list of control arguments specified via `robustbetareg.control`.
`model`	logical. If `TRUE` the corresponding components of the fit (model frame, response, model matrix) are returned.
`...`	argument to be passed to `robustbetareg.control`.
`y`, `x`, `z`	`y` must be a numeric response vector (with values in `(0,1)`), `x` must be a numeric regressor matrix for the mean submodel, and `z` must be a numeric regressor matrix for the precision submodel.

Details

Beta regression models are employed to model continuous response variables in the unit interval, like rates and proportions. The maximum likelihood-based inference suffers from the lack of robustness in the presence of outliers. Based on the density power divergence, Ghosh (2019) proposed the minimum density power divergence estimator (MDPDE). Ribeiro and Ferrari (2022) proposed an estimator based on the maximization of a reparameterized Lq-likelihood; it is called SMLE. These estimators require suitable restrictions in the parameter space. Maluf et al. (2022) proposed robust estimators based on the MDPDE and the SMLE which have the advantage of overcoming this drawback. These estimators are called LMDPDE and LSMLE. For details, see the cited works. The four estimators are implemented in the robustbetareg function. They depend on a tuning constant (called \alpha). When the tuning constant is fixed and equal to 0, all of the estimators coincide with the maximum likelihood estimator. Ribeiro and Ferrari (2022) and Maluf et al. (2022) suggest using a data-driven algorithm to select the optimum value of \alpha. This algorithm is implemented in robustbetareg by default when the argument "alpha" is suppressed.

The formulation of the model has the same structure as in the usual functions glm and betareg. The argument formula can comprise of three parts (separated by the symbols "~" and "|"), namely: observed response variable in the unit interval, predictor of the mean submodel, with link function link and predictor of the precision submodel, with link.phi link function. If the model has constant precision, the third part may be omitted and the link function for phi is "identity" by default. The tuning constant alpha may be treated as fixed or not (chosen by the data-driven algorithm). If alpha is fixed, its value must be specified in the alpha argument.

Some methods are available for objects of class "robustbetareg", see plot.robustbetareg, summary.robustbetareg, coef.robustbetareg, and residuals.robustbetareg, for details and other methods.

Value

robustbetareg returns an object of class "robustbetareg" with a list of the following components:

`coefficients`	a list with the "`mean`" and "`precision`" coefficients.

`vcov`	covariance matrix.

`converged`	logical indicating successful convergence of the iterative process.

`fitted.values`	a vector with the fitted values of the mean submodel.

`start`	a vector with the starting values used in the iterative process.

`weights`	the weights of each observation in the estimation process.

`Tuning`	value of the tuning constant (automatically chosen or fixed) used in the estimation process.

`residuals`	a vector of standardized weighted residual 2 (see Espinheira et al. (2008)).

`n`	number of observations.

`link`	link function used in the mean submodel.

`link.phi`	link function used in the precision submodel.

`Optimal.Tuning`	logical indicating whether the data-driven algorithm was used.

`pseudo.r.squared`	pseudo R-squared value.

`control`	the control arguments passed to the data-driven algorithm and `optim` call.

`std.error`	the standard errors.

`method`	type of estimator used.

`call`	the original function call.

`formula`	the formula used.

`model`	the full model frame.

`terms`	a list with elements "`mean`", "`precision`" and "`full`" containing the term objects for the respective models.

`y`	the response variable.

`data`	the dataset used.

Author(s)

Yuri S. Maluf (yurimaluf@gmail.com), Francisco F. Queiroz (ffelipeq@outlook.com) and Silvia L. P. Ferrari.

References

Maluf, Y.S., Ferrari, S.L.P., and Queiroz, F.F. (2022). Robust beta regression through the logit transformation. arXiv:2209.11315.

Ribeiro, T.K.A. and Ferrari, S.L.P. (2022). Robust estimation in beta regression via maximum Lq-likelihood. Statistical Papers. DOI: 10.1007/s00362-022-01320-0.

Ghosh, A. (2019). Robust inference under the beta regression model with application to health care studies. Statistical Methods in Medical Research, 28:271-888.

Espinheira, P.L., Ferrari, S.L.P., and Cribari-Neto, F. (2008). On beta regression residuals. Journal of Applied Statistics, 35:407–419.

Examples

#### Risk Manager Cost data
data("Firm")

# MLE fit (fixed alpha equal to zero)
fit_MLE <- robustbetareg(FIRMCOST ~ SIZELOG + INDCOST,
                         data = Firm, type = "LMDPDE", alpha = 0)
summary(fit_MLE)

# MDPDE with alpha = 0.04
fit_MDPDE <- robustbetareg(FIRMCOST ~ SIZELOG + INDCOST,
                           data = Firm, type = "MDPDE",
                           alpha = 0.04)
summary(fit_MDPDE)

# Choosing alpha via data-driven algorithm
fit_MDPDE2 <- robustbetareg(FIRMCOST ~ SIZELOG + INDCOST,
                            data = Firm, type = "MDPDE")
summary(fit_MDPDE2)

# Similar result for the LMDPDE fit:
fit_LMDPDE2 <- robustbetareg(FIRMCOST ~ SIZELOG + INDCOST,
                             data = Firm, type = "LMDPDE")
summary(fit_LMDPDE2)

# Diagnostic plots


#### HIC data
data("HIC")

# MLE (fixed alpha equal to zero)
fit_MLE <- robustbetareg(HIC ~ URB + GDP |
                         GDP, data = HIC, type = "LMDPDE",
                         alpha = 0)
summary(fit_MLE)

# SMLE and MDPDE with alpha selected via data-driven algorithm
fit_SMLE <- robustbetareg(HIC ~ URB + GDP |
                          GDP, data = HIC, type = "SMLE")
summary(fit_SMLE)
fit_MDPDE <- robustbetareg(HIC ~ URB + GDP |
                           GDP, data = HIC, type = "MDPDE")
summary(fit_MDPDE)
# SMLE and MDPDE return MLE because of the lack of stability

# LSMLE and LMDPDE with alpha selected via data-driven algorithm
fit_LSMLE <- robustbetareg(HIC ~ URB + GDP |
                           GDP, data = HIC, type = "LSMLE")
summary(fit_LSMLE)
fit_LMDPDE <- robustbetareg(HIC ~ URB + GDP |
                            GDP, data = HIC, type = "LMDPDE")
summary(fit_LMDPDE)
# LSMLE and LMDPDE return robust estimates with alpha = 0.06


# Plotting the weights against the residuals - LSMLE fit.
plot(fit_LSMLE$residuals, fit_LSMLE$weights, pch = "+", xlab = "Residuals",
    ylab = "Weights")

# Excluding outlier observation.
fit_LSMLEwo1 <- robustbetareg(HIC ~ URB + GDP |
                              GDP, data = HIC[-1,], type = "LSMLE")
summary(fit_LSMLEwo1)

# Normal probability plot with simulated envelope
 plotenvelope(fit_LSMLE)

[Package robustbetareg version 0.3.0 Index]