bbreg {bbreg} | R Documentation |

## bbreg

### Description

Function to fit, via Expectation-Maximization (EM) algorithm, the bessel or the beta regression to a given data set with a bounded continuous response variable.

### Usage

```
bbreg(
formula,
data,
link.mean = c("logit", "probit", "cauchit", "cloglog"),
link.precision = c("identity", "log", "sqrt", "inverse"),
model = NULL,
residual = NULL,
envelope = 0,
prob = 0.95,
predict = 0,
ptest = 0.25,
epsilon = 10^(-5)
)
```

### Arguments

`formula` |
symbolic description of the model (examples: |

`data` |
elements expressed in formula. This is usually a data frame composed by:
(i) the bounded continuous observations in |

`link.mean` |
optionally, a string containing the link function for the mean. If omitted, the 'logit' link function will be used. The possible link functions for the mean are "logit","probit", "cauchit", "cloglog". |

`link.precision` |
optionally, a string containing the link function the precision parameter. If omitted and the only precision covariate is the intercept, the identity link function will be used, if omitted and there is a precision covariate other than the intercept, the 'log' link function will be used. The possible link functions for the precision parameter are "identity", "log", "sqrt", "inverse". |

`model` |
character ("bessel" or "beta") indicating the type of model to be fitted. The default is NULL, meaning that a discrimination test must be applied to select the model. |

`residual` |
character indicating the type of residual to be evaluated ("pearson", "score" or "quantile"). The default is "pearson". |

`envelope` |
number of simulations (synthetic data sets) to build envelopes for residuals (with |

`prob` |
probability indicating the confidence level for the envelopes (default: |

`predict` |
number of partitions (training set to fit the model and a test set to calculate residuals) to be evaluated in a predictive accuracy
study related to the |

`ptest` |
proportion of the sample size to be considered in the test set for the |

`epsilon` |
tolerance value to control the convergence criterion in the EM-algorithm (default = 10^(-5)). |

### Details

The bessel regression originates from a class of normalized inverse-Gaussian (N-IG) process introduced in *Lijoi et al. (2005)*
as an alternative to the widely used Dirichlet process in the Bayesian context. These authors consider a ratio of inverse-Gaussian random variables
to define the new process. In the particular univariate case, the N-IG is obtained from the representation "Z = Y1/(Y1+Y2)", with "Y1" and "Y2" being
independent inverse-Gaussian random variables having scale = 1 and shape parameters "a1 > 0" and "a2 > 0", respectively.
Denote "Y1 ~ IG(a1)" and "Y2 ~ IG(a2)". The density of "Z" has support in the interval (0,1) and it depends on the modified Bessel function of third
kind with order 1, named here as "*K1(-)*". The presence of "*K1(-)*" in the structure of the p.d.f. establishes the name of the new distribution;
consider Z ~ Bessel(a1,a2). Note that the name "beta distribution" is also an analogy to the presence of a function (the beta function)
in its p.d.f. structure. The bessel regression model is defined by assuming "Z_1,...,Z_n" as a random sample of continuous bounded responses with
"Z_i ~ Bessel(mu_i,phi_i)" for "i = 1,...,n". Using this parameterization, one can write: "E(Z_i) = mu_i" and "Var(Z_i) = mu_i(1-mu_i) g(phi_i)",
where "*g(-)*" is a function depending on the exponential integral of "phi_i". The following link functions are assumed "logit(mu_i) = x_i^T kappa" and
"log(phi_i) = v_i^T lambda", where "kappa' = (kappa_1,...,kappa_p)" and "lambda' = (lambda_1,...,lambda[q])" are real valued vectors.
The terms "x_i^T" and "v_i^T" represent, respectively, the i-th row of the matrices "x" (*nxp*) and "v" (*nxq*) containing covariates in their columns
("x_i,1" and "v_i,1" may be 1 to handle intercepts). As it can be seen, this regression model has two levels with covariates explaining the mean
"mu_i" and the parameter "phi_i". For more details about the bessel regression see *Barreto-Souza, Mayrink and Simas (2022)* and *Barreto-Souza, Mayrink and Simas (2020)*.

This package implements an Expectation Maximization (EM) algorithm to fit the bessel regression. The full EM approach proposed in *Barreto-Souza and Simas (2017)* for the beta
regression is also available here. Fitting the beta regression via EM-algorithm is a major difference between the present package bbreg and the
well known `betareg`

created by Alexandre B. Simas and currently maintained by Achim Zeileis. The estimation procedure on the betareg packages
is given by maximizing the beta model likelihood via `optim`

.
In terms of initial values, bbreg uses quasi-likelihood estimates as the starting points for
the EM-algorithms. The formulation of the target model also has the same structure as in the standard functions `lm`

, `glm`

and betareg,
with also the same structure as the latter when precision covariates are being used. The user is supposed to
write a formula object describing elements of the regression (response, covariates for the mean submodel,
covariates for the precision submodel, presence of intercepts, and interactions). As an example, the description
"z ~ x" indicates: "response = z" (continuous and bounded by 0 and 1), "covariates = columns of x" (mean submodel) and
precision submodel having only an intercept. On the other hand, the configuration "z ~ x | v" establishes that the covariates given
in the columns of "v" must be used in the precision submodel. Intercepts may be removed by setting
"z ~ 0 + x | 0 + v" or "z ~ x - 1|v - 1". Absence of intercept and covariates is not allowed in any submodel.
The type of model to be fitted ("bessel" or "beta") can be specified through the argument "model" of
bbreg. If the user does not specify the model, the package will automatically apply a discrimination
test (DBB - Discrimination between Bessel and Beta),
developed in *Barreto-Souza, Mayrink and Simas (2022)* and *Barreto-Souza, Mayrink and Simas (2020)*, to select the most appropriate model for the given
data set. In this case, some quantities related to the DBB are included in the final output; they are:
"sum(Z2/n)" = mean of z_i^2, "sum(quasi_mu)" = sum (for i = 1,...,n) of muq_i + muq_i(1-muq_i)/2,
with muq_i being the quasi-likelihood estimator of mu_i and, finally, the quantities "|D_bessel|" and
"|D_beta|" depending on muq_i and the EM-estimates of phi_i under bessel or beta.

In the current version, three types of residuals are available for analysis ("Pearson", "Score" and "Quantile").
The user may choose one of them via the argument "residual". The score residual is computed empirically, based
on 100 artificial data sets generated from the fitted model. The sample size
is the same of the original data and the simulations are used to estimate the mean and s.d. required in the score
residual formulation. The user
may also choose to build envelopes for the residuals with confidence level in "prob". This feature also requires simulations of synthetic data
("envelope" is the number of replications). Residuals are obtained for each data set and confronted against the quantiles of the N(0,1). Predictive
accuracy of the fitted model is also explored by setting "predict" as a positive integer (this value represents the number of random partitions to be evaluated).
In this case, the full data set is separated in a training (partition to fit the model) and a test set (to evaluate residuals) for which the
RSS (Residual Sum of Squares) is computed. The default partition is 75% (training) and 25% (test); this can be modified by choosing the
proportion `ptest`

for the test set (large `ptest`

is not recommended).

### Value

Object of class bbreg containing the outputs from the model fit (bessel or beta regression).

### References

DOI:10.1111/anzs.12354 (Barreto-Souza, Mayrink and Simas; 2022)

arXiv:2003.05157 (Barreto-Souza, Mayrink and Simas; 2020)

DOI:10.1080/00949655.2017.1350679 (Barreto-Souza and Simas; 2017)

DOI:10.18637/jss.v034.i02 (Cribari-Neto and Zeileis; 2010)

DOI:10.1198/016214505000000132 (Lijoi et al.; 2005)

### See Also

`summary.bbreg`

, `plot.bbreg`

, `simdata_bes`

, `dbessel`

, `dbbtest`

, `simdata_bet`

, `Formula`

### Examples

```
# Example with artificial data.
n = 100; x = cbind(rbinom(n, 1, 0.5), runif(n, -1, 1)); v = runif(n, -1, 1);
z = simdata_bes(kap = c(1, -1, 0.5), lam = c(0.5, -0.5), x, v,
repetition = 1, link.mean = "logit", link.precision = "log")
z = unlist(z)
fit1 = bbreg(z ~ x | v)
summary(fit1)
plot(fit1)
# Examples using the Weather Task (WT) data available in bbreg.
fit2 = bbreg(agreement ~ priming + eliciting, data = WT)
summary(fit2)
fit3 = bbreg(agreement ~ priming + eliciting, envelope = 30, predict = 10, data = WT)
summary(fit3)
# Example with precision covariates
fit4 = bbreg(agreement ~ priming + eliciting|eliciting, data = WT)
summary(fit4)
# Example with different link functions:
fit5 = bbreg(agreement ~ priming + eliciting|eliciting, data = WT,
link.mean = "cloglog", link.precision = "sqrt")
summary(fit5)
```

*bbreg*version 2.0.2 Index]