R: Power Logit Regression Models for Bounded Variables

PLreg {PLreg}

R Documentation

Power Logit Regression Models for Bounded Variables

Description

PLreg is used to fit power logit regression model for continuous and bounded variables via maximum likelihood approach. Both median and dispersion of the response variable are modeled through parametric functions.

Usage

PLreg(
  formula,
  data,
  subset,
  na.action,
  family = c("NO", "LO", "TF", "PE", "SN", "SLASH", "Hyp"),
  zeta = NULL,
  link = c("logit", "probit", "cloglog", "cauchit", "loglog"),
  link.sigma = NULL,
  type = c("pML", "ML"),
  control = PLreg.control(...),
  model = TRUE,
  y = TRUE,
  x = FALSE,
  ...
)

PLreg.fit(
  X,
  y,
  S = NULL,
  family,
  type = "pML",
  zeta = zeta,
  link = "logit",
  link.sigma = "log",
  control = PLreg.control()
)

Arguments

`formula`	a symbolic description of the model. See details for further information.
`data`, `subset`, `na.action`	arguments controlling formula processing via `model.frame`.
`family`	a description of the symmetric distribution to be used for generating the power logit model. Supported families include "`NO`", "`LO`", "`TF`", "`PE`", "`Hyp`", `SHN`" and "`SLASH`", which correspond to the power logit normal, type II logistic, Student-t, power exponential, hyperbolic, sinh-normal, and slash distributions, respectively.
`zeta`	a numeric value or numeric vector that represents the extra parameter of the distribution. For the PL-NO and PL-LO models, no extra parameter is needed.
`link`	an optional character that specifies the link function of the median submodel (mu). The "`logit`", "`probit`", "`cloglog`", "`cauchit`", "`loglog`" functions are supported. The `logit` function is the default.
`link.sigma`	an optional character that specifies the link function of the dispersion submodel (sigma). The "`log`", "`sqrt`" functions are supported. The default is `log`.
`type`	character specifying the type of estimator for the skewness parameter. Currently, penalized maximum likelihood ("`pML`") and maximum likelihood ("`ML`") are supported. If the skewness parameter is fixed, `ML` type is used.
`control`	a list of control arguments specified via `PLreg.control`.
`model`, `y`, `x`	logicals. If `TRUE` the corresponding components of the fit (model frame, response, model matrix) are returned. For `PLreg.fit`, `y` must be the numeric response vector (with values in (0,1)).
`...`	arguments passed to `PLreg.control`.
`X`	numeric regressor matrix for the median submodel.
`S`	numeric regressor matrix for the dispersion submodel.

Details

The power logit regression models, proposed by Queiroz and Ferrari (2022), is useful in situations when the response variable is continuous and bounded on the unit interval (0, 1). The median and the dispersion parameters are modeled through parametric link functions. The models depend on a skewness parameter (called \lambda). When the skewness parameter is fixed and equal to 1, the power logit models coincide with the GJS regression models (Lemonte and Bazan, 2016). Queiroz and Ferrari (2022) suggest using a penalized maximum likelihood method to estimate the parameters. This method is implemented in PLreg by default when \lambda is not fixed. If convergence is not reached, maximum likelihood estimation is performed. The estimation process uses optim. If no starting values are specified, the PLreg function uses those suggested by Queiroz and Ferrari (2022). This function also fits the log-log regression models by setting \lambda at zero (\lambda = 0 represents \lambda \rightarrow 0^+).

The formulation of the model has the same structure as in the usual functions lm and glm. The argument formula could comprise of three parts (separated by the symbols "~" and "|"), namely: observed response variable in the unit interval, predictor of the median submodel, with link function link and predictor of the dispersion submodel, with link.sigma link function. If the model has constant dispersion, the third part may be omitted and the link function for sigma is "log" by default. The skewness parameter lambda may be treated as fixed or not (default). If lambda is fixed, its value must be specified in the control argument.

Some methods are available for objects of class "PLreg", see plot.PLreg, summary.PLreg, coef.PLreg, vcov.PLreg, and residuals.PLreg, for details and other methods.

Value

PLreg returns an object of class "PLreg" with the following components (the PLreg.fit returns elements up to v).

`coefficients`	a list with the "`median`", "`dispersion`" and "`skewness`" (if `lambda = NULL`) coefficients.
`residuals`	a vector of the raw residuals (the difference between the observed and the fitted response).
`fitted.values`	a vector with the fitted values of the median submodel.
`optim`	a list with the output from `optim`. When lambda is not fixed, if `type = "pML"`, the output refers to the iterative process of the median and dispersion parameters only and, if `type = "ML"`, on the maximization of the likelihood for all the parameters.
`family`	a character specifying the `family` used.
`method`	the method argument passed to the optim call.
`control`	the control arguments passed to the optim call.
`start`	a vector with the starting values used in the iterative process.
`nobs`	number of observations.
`df.null`	residual degrees of freedom in the null model (constant median and dispersion), i.e., `n-3`.
`df.residual`	residual degrees of freedom in the fitted model.
`lambda`	value of the skewness parameter lambda (`NULL` when lambda is not fixed).
`loglik`	log-likelihood of the fitted model.
`loglikp`	penalized profile log-likelihood for lambda. If lambda is equal to zero, `loglikp` returns 1.
`vcov`	covariance matrix of all the parameters.
`pseudo.r.squared`	pseudo R-squared value.
`Upsilon.zeta`	an overall goodness-of-fit measure.
`link`	a list with elements "`median`" and "`dispersion`" containing the link objects for the respective models.
`converged`	logical indicating successful convergence of the iterative process.
`zeta`	a numeric specifying the value of zeta used in the estimation process.
`type`	a character specifying the estimation method used.
`v`	a vector with the v(z) values for all the observations (see Queiroz and Ferrari(2021)).
`call`	the original function call.
`formula`	the formula used.
`terms`	a list with elements "`median`", "`dispersion`" and "`full`" containing the term objects for the respective models.
`levels`	a list with elements "`median`", "`dispersion`" and "`full`" containing the levels of the categorical regressors.
`contrasts`	a list with elements "`median`" and "`dispersion`" containing the contrasts corresponding to levels from the respective models.
`model`	the full model frame (if `y = TRUE`).
`y`	the response variable (if `y = TRUE`).
`x`	a list with elements "`median`" and "`dispersion`" with the matrices from the median and dispersion submodels (if `x = TRUE`).

Author(s)

Francisco Felipe de Queiroz (ffelipeq@outlook.com) and Silvia L. P. Ferrari.

References

Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression for modeling bounded data. arXiv:2202.01697.

Lemonte, A. J. and Bazan, J. L. (2015). New class of Johnson SB distributions and its associated regression model for rates and proportions. Biometrical Journal. 58:727-746.

Examples

#### Body fat data
data("bodyfat_Aeolus")

#Initial model with zeta = 2
fit1 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 2)
summary(fit1)
# Choosing the best value for zeta
# extra.parameter(fit1, lower = 1, upper = 4, grid = 15)

# Using zeta = 1.7
fit2 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 1.7)
summary(fit2)

# Fixing lambda = 1
fit3 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 1.7,
             control = PLreg.control(lambda = 1))
summary(fit3)

# Comparing the AIC and Upsilon values between fit2 and fit3
AIC(fit2) < AIC(fit3) # TRUE
fit2$Upsilon.zeta < fit3$Upsilon.zeta #TRUE

#### Firm cost data
data("Firm")

fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
              data = Firm,
              family = "SLASH",
              zeta = 2.13)
summary(fitPL)
#extra.parameter(fitPL, lower = 1.2, upper = 4, grid = 10)
#plot(fitPL, type = "standardized")
#envelope(fitPL, type = "standardized")

fitPL_wo72 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
                   data = Firm[-72,],
                   family = "SLASH",
                   zeta = 2.13)
fitPL_wo15 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
                   data = Firm[-15,],
                   family = "SLASH",
                   zeta = 2.13)
fitPL_wo16 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
                   data = Firm[-16,],
                   family = "SLASH",
                   zeta = 2.13)

coef.mu      <- coef(fitPL)[1:3]
coef.mu_wo72 <- coef(fitPL_wo72)[1:3]
coef.mu_wo15 <- coef(fitPL_wo15)[1:3]
coef.mu_wo16 <- coef(fitPL_wo16)[1:3]

plot(Firm$indcost, Firm$firmcost,
    pch = "+",
    xlab = "indcost",
    ylab = "firmcost")
#identify(Firm$indcost, Firm$firmcost)
covariate = matrix(c(rep.int(1, 1000),
                    rep(median(Firm$sizelog), 1000),
                    seq(0, 1.22, length.out = 1000)),
                  ncol = 3)
lines(covariate[,3],
     as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu)),
     type = "l")
lines(covariate[,3],
     as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo72)),
     type = "l", lty = 2, col = "blue")
lines(covariate[,3],
     as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo15)),
     type = "l", lty = 3, col = "red")
lines(covariate[,3],
     as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo16)),
     type = "l", lty = 4, col = "green")
parameters = c("pML",
              "pML w/o 72",
              "pML w/o 15",
              "pML w/o 16")
legend(x = 0.5,
      y = 0.8,
      legend = parameters,
      col = c("black", "blue", "red", "green"),
      lty = c(1, 2, 3, 4),
      cex = 0.6)

[Package PLreg version 0.4.1 Index]