PLreg {PLreg}R Documentation

Power Logit Regression Models for Bounded Variables

Description

PLreg is used to fit power logit regression model for continuous and bounded variables via maximum likelihood approach. Both median and dispersion of the response variable are modeled through parametric functions.

Usage

PLreg(
  formula,
  data,
  subset,
  na.action,
  family = c("NO", "LO", "TF", "PE", "SN", "SLASH", "Hyp"),
  zeta = NULL,
  link = c("logit", "probit", "cloglog", "cauchit", "loglog"),
  link.sigma = NULL,
  type = c("pML", "ML"),
  control = PLreg.control(...),
  model = TRUE,
  y = TRUE,
  x = FALSE,
  ...
)

PLreg.fit(
  X,
  y,
  S = NULL,
  family,
  type = "pML",
  zeta = zeta,
  link = "logit",
  link.sigma = "log",
  control = PLreg.control()
)

Arguments

formula

a symbolic description of the model. See details for further information.

data, subset, na.action

arguments controlling formula processing via model.frame.

family

a description of the symmetric distribution to be used for generating the power logit model. Supported families include "NO", "LO", "TF", "PE", "Hyp", SHN" and "SLASH", which correspond to the power logit normal, type II logistic, Student-t, power exponential, hyperbolic, sinh-normal, and slash distributions, respectively.

zeta

a numeric value or numeric vector that represents the extra parameter of the distribution. For the PL-NO and PL-LO models, no extra parameter is needed.

link

an optional character that specifies the link function of the median submodel (mu). The "logit", "probit", "cloglog", "cauchit", "loglog" functions are supported. The logit function is the default.

link.sigma

an optional character that specifies the link function of the dispersion submodel (sigma). The "log", "sqrt" functions are supported. The default is log.

type

character specifying the type of estimator for the skewness parameter. Currently, penalized maximum likelihood ("pML") and maximum likelihood ("ML") are supported. If the skewness parameter is fixed, ML type is used.

control

a list of control arguments specified via PLreg.control.

model, y, x

logicals. If TRUE the corresponding components of the fit (model frame, response, model matrix) are returned. For PLreg.fit, y must be the numeric response vector (with values in (0,1)).

...

arguments passed to PLreg.control.

X

numeric regressor matrix for the median submodel.

S

numeric regressor matrix for the dispersion submodel.

Details

The power logit regression models, proposed by Queiroz and Ferrari (2022), is useful in situations when the response variable is continuous and bounded on the unit interval (0, 1). The median and the dispersion parameters are modeled through parametric link functions. The models depend on a skewness parameter (called \lambda). When the skewness parameter is fixed and equal to 1, the power logit models coincide with the GJS regression models (Lemonte and Bazan, 2016). Queiroz and Ferrari (2022) suggest using a penalized maximum likelihood method to estimate the parameters. This method is implemented in PLreg by default when \lambda is not fixed. If convergence is not reached, maximum likelihood estimation is performed. The estimation process uses optim. If no starting values are specified, the PLreg function uses those suggested by Queiroz and Ferrari (2022). This function also fits the log-log regression models by setting \lambda at zero (\lambda = 0 represents \lambda \rightarrow 0^+).

The formulation of the model has the same structure as in the usual functions lm and glm. The argument formula could comprise of three parts (separated by the symbols "~" and "|"), namely: observed response variable in the unit interval, predictor of the median submodel, with link function link and predictor of the dispersion submodel, with link.sigma link function. If the model has constant dispersion, the third part may be omitted and the link function for sigma is "log" by default. The skewness parameter lambda may be treated as fixed or not (default). If lambda is fixed, its value must be specified in the control argument.

Some methods are available for objects of class "PLreg", see plot.PLreg, summary.PLreg, coef.PLreg, vcov.PLreg, and residuals.PLreg, for details and other methods.

Value

PLreg returns an object of class "PLreg" with the following components (the PLreg.fit returns elements up to v).

coefficients

a list with the "median", "dispersion" and "skewness" (if lambda = NULL) coefficients.

residuals

a vector of the raw residuals (the difference between the observed and the fitted response).

fitted.values

a vector with the fitted values of the median submodel.

optim

a list with the output from optim. When lambda is not fixed, if type = "pML", the output refers to the iterative process of the median and dispersion parameters only and, if type = "ML", on the maximization of the likelihood for all the parameters.

family

a character specifying the family used.

method

the method argument passed to the optim call.

control

the control arguments passed to the optim call.

start

a vector with the starting values used in the iterative process.

nobs

number of observations.

df.null

residual degrees of freedom in the null model (constant median and dispersion), i.e., n-3.

df.residual

residual degrees of freedom in the fitted model.

lambda

value of the skewness parameter lambda (NULL when lambda is not fixed).

loglik

log-likelihood of the fitted model.

loglikp

penalized profile log-likelihood for lambda. If lambda is equal to zero, loglikp returns 1.

vcov

covariance matrix of all the parameters.

pseudo.r.squared

pseudo R-squared value.

Upsilon.zeta

an overall goodness-of-fit measure.

link

a list with elements "median" and "dispersion" containing the link objects for the respective models.

converged

logical indicating successful convergence of the iterative process.

zeta

a numeric specifying the value of zeta used in the estimation process.

type

a character specifying the estimation method used.

v

a vector with the v(z) values for all the observations (see Queiroz and Ferrari(2021)).

call

the original function call.

formula

the formula used.

terms

a list with elements "median", "dispersion" and "full" containing the term objects for the respective models.

levels

a list with elements "median", "dispersion" and "full" containing the levels of the categorical regressors.

contrasts

a list with elements "median" and "dispersion" containing the contrasts corresponding to levels from the respective models.

model

the full model frame (if y = TRUE).

y

the response variable (if y = TRUE).

x

a list with elements "median" and "dispersion" with the matrices from the median and dispersion submodels (if x = TRUE).

Author(s)

Francisco Felipe de Queiroz (ffelipeq@outlook.com) and Silvia L. P. Ferrari.

References

Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression for modeling bounded data. arXiv:2202.01697.

Lemonte, A. J. and Bazan, J. L. (2015). New class of Johnson SB distributions and its associated regression model for rates and proportions. Biometrical Journal. 58:727-746.

See Also

summary.PLreg, PLreg.control, residuals.PLreg

Examples

#### Body fat data
data("bodyfat_Aeolus")

#Initial model with zeta = 2
fit1 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 2)
summary(fit1)
# Choosing the best value for zeta
# extra.parameter(fit1, lower = 1, upper = 4, grid = 15)

# Using zeta = 1.7
fit2 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 1.7)
summary(fit2)

# Fixing lambda = 1
fit3 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
             family = "PE", zeta = 1.7,
             control = PLreg.control(lambda = 1))
summary(fit3)

# Comparing the AIC and Upsilon values between fit2 and fit3
AIC(fit2) < AIC(fit3) # TRUE
fit2$Upsilon.zeta < fit3$Upsilon.zeta #TRUE

#### Firm cost data
data("Firm")

fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
              data = Firm,
              family = "SLASH",
              zeta = 2.13)
summary(fitPL)
#extra.parameter(fitPL, lower = 1.2, upper = 4, grid = 10)
#plot(fitPL, type = "standardized")
#envelope(fitPL, type = "standardized")

fitPL_wo72 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
                   data = Firm[-72,],
                   family = "SLASH",
                   zeta = 2.13)
fitPL_wo15 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
                   data = Firm[-15,],
                   family = "SLASH",
                   zeta = 2.13)
fitPL_wo16 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
                   data = Firm[-16,],
                   family = "SLASH",
                   zeta = 2.13)

coef.mu      <- coef(fitPL)[1:3]
coef.mu_wo72 <- coef(fitPL_wo72)[1:3]
coef.mu_wo15 <- coef(fitPL_wo15)[1:3]
coef.mu_wo16 <- coef(fitPL_wo16)[1:3]

plot(Firm$indcost, Firm$firmcost,
    pch = "+",
    xlab = "indcost",
    ylab = "firmcost")
#identify(Firm$indcost, Firm$firmcost)
covariate = matrix(c(rep.int(1, 1000),
                    rep(median(Firm$sizelog), 1000),
                    seq(0, 1.22, length.out = 1000)),
                  ncol = 3)
lines(covariate[,3],
     as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu)),
     type = "l")
lines(covariate[,3],
     as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo72)),
     type = "l", lty = 2, col = "blue")
lines(covariate[,3],
     as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo15)),
     type = "l", lty = 3, col = "red")
lines(covariate[,3],
     as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo16)),
     type = "l", lty = 4, col = "green")
parameters = c("pML",
              "pML w/o 72",
              "pML w/o 15",
              "pML w/o 16")
legend(x = 0.5,
      y = 0.8,
      legend = parameters,
      col = c("black", "blue", "red", "green"),
      lty = c(1, 2, 3, 4),
      cex = 0.6)


[Package PLreg version 0.4.1 Index]