PLreg {PLreg} | R Documentation |
Power Logit Regression Models for Bounded Variables
Description
PLreg
is used to fit power logit regression model for continuous and bounded variables via maximum likelihood approach.
Both median and dispersion of the response variable are modeled through
parametric functions.
Usage
PLreg(
formula,
data,
subset,
na.action,
family = c("NO", "LO", "TF", "PE", "SN", "SLASH", "Hyp"),
zeta = NULL,
link = c("logit", "probit", "cloglog", "cauchit", "loglog"),
link.sigma = NULL,
type = c("pML", "ML"),
control = PLreg.control(...),
model = TRUE,
y = TRUE,
x = FALSE,
...
)
PLreg.fit(
X,
y,
S = NULL,
family,
type = "pML",
zeta = zeta,
link = "logit",
link.sigma = "log",
control = PLreg.control()
)
Arguments
formula |
a symbolic description of the model. See details for further information. |
data , subset , na.action |
arguments controlling formula processing via |
family |
a description of the symmetric distribution to be used for generating the power logit model.
Supported families include " |
zeta |
a numeric value or numeric vector that represents the extra parameter of the distribution. For the PL-NO and PL-LO models, no extra parameter is needed. |
link |
an optional character that specifies the link function of the median submodel (mu).
The " |
link.sigma |
an optional character that specifies the link function of the dispersion submodel (sigma).
The " |
type |
character specifying the type of estimator for the skewness parameter.
Currently, penalized maximum likelihood (" |
control |
a list of control arguments specified via |
model , y , x |
logicals. If |
... |
arguments passed to |
X |
numeric regressor matrix for the median submodel. |
S |
numeric regressor matrix for the dispersion submodel. |
Details
The power logit regression models, proposed by Queiroz and Ferrari (2022), is useful in
situations when the response variable is continuous and bounded on the unit interval (0, 1).
The median and the dispersion parameters are modeled through parametric link
functions. The models depend on a skewness parameter (called \lambda
). When the skewness parameter is fixed
and equal to 1, the power logit models coincide with the GJS regression models
(Lemonte and Bazan, 2016). Queiroz and Ferrari (2022) suggest using a penalized maximum
likelihood method to estimate the parameters. This method is implemented in
PLreg
by default when \lambda
is not fixed. If convergence is not reached,
maximum likelihood estimation is performed. The estimation
process uses optim
. If no starting values are specified,
the PLreg
function uses those suggested by Queiroz and Ferrari (2022).
This function also fits the log-log regression models by setting \lambda
at zero (\lambda = 0
represents \lambda \rightarrow 0^+
).
The formulation of the model has the same structure as in the usual functions
lm
and glm
. The argument
formula
could comprise of three parts (separated by the symbols "~
" and "|
"),
namely: observed response variable in the unit interval, predictor of the median submodel,
with link function link
and predictor of the dispersion submodel, with link.sigma
link function. If the model has constant dispersion, the third part may be omitted and the link function for sigma
is "log
" by default. The skewness parameter lambda
may be
treated as fixed or not (default). If lambda
is fixed, its value
must be specified in the control
argument.
Some methods are available for objects of class "PLreg
",
see plot.PLreg
, summary.PLreg
,
coef.PLreg
, vcov.PLreg
, and
residuals.PLreg
, for details and other methods.
Value
PLreg
returns an object of class "PLreg
" with the following
components (the PLreg.fit
returns elements up to v
).
coefficients |
a list with the " |
residuals |
a vector of the raw residuals (the difference between the observed and the fitted response). |
fitted.values |
a vector with the fitted values of the median submodel. |
optim |
a list with the output from |
family |
a character specifying the |
method |
the method argument passed to the optim call. |
control |
the control arguments passed to the optim call. |
start |
a vector with the starting values used in the iterative process. |
nobs |
number of observations. |
df.null |
residual degrees of freedom in the null model
(constant median and dispersion), i.e., |
df.residual |
residual degrees of freedom in the fitted model. |
lambda |
value of the skewness parameter lambda
( |
loglik |
log-likelihood of the fitted model. |
loglikp |
penalized profile log-likelihood for lambda. If lambda is
equal to zero, |
vcov |
covariance matrix of all the parameters. |
pseudo.r.squared |
pseudo R-squared value. |
Upsilon.zeta |
an overall goodness-of-fit measure. |
link |
a list with elements " |
converged |
logical indicating successful convergence of the iterative process. |
zeta |
a numeric specifying the value of zeta used in the estimation process. |
type |
a character specifying the estimation method used. |
v |
a vector with the v(z) values for all the observations (see Queiroz and Ferrari(2021)). |
call |
the original function call. |
formula |
the formula used. |
terms |
a list with elements " |
levels |
a list with elements " |
contrasts |
a list with elements " |
model |
the full model frame (if |
y |
the response variable (if |
x |
a list with elements " |
Author(s)
Francisco Felipe de Queiroz (ffelipeq@outlook.com) and Silvia L. P. Ferrari.
References
Queiroz, F. F. and Ferrari, S. L. P. (2022). Power logit regression
for modeling bounded data. arXiv:2202.01697.
Lemonte, A. J. and Bazan, J. L. (2015). New class of Johnson SB distributions
and its associated regression model for rates and proportions. Biometrical Journal. 58:727-746.
See Also
summary.PLreg
, PLreg.control
, residuals.PLreg
Examples
#### Body fat data
data("bodyfat_Aeolus")
#Initial model with zeta = 2
fit1 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
family = "PE", zeta = 2)
summary(fit1)
# Choosing the best value for zeta
# extra.parameter(fit1, lower = 1, upper = 4, grid = 15)
# Using zeta = 1.7
fit2 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
family = "PE", zeta = 1.7)
summary(fit2)
# Fixing lambda = 1
fit3 <- PLreg(percentfat ~ days + sex + year, data = bodyfat_Aeolus,
family = "PE", zeta = 1.7,
control = PLreg.control(lambda = 1))
summary(fit3)
# Comparing the AIC and Upsilon values between fit2 and fit3
AIC(fit2) < AIC(fit3) # TRUE
fit2$Upsilon.zeta < fit3$Upsilon.zeta #TRUE
#### Firm cost data
data("Firm")
fitPL <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
data = Firm,
family = "SLASH",
zeta = 2.13)
summary(fitPL)
#extra.parameter(fitPL, lower = 1.2, upper = 4, grid = 10)
#plot(fitPL, type = "standardized")
#envelope(fitPL, type = "standardized")
fitPL_wo72 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
data = Firm[-72,],
family = "SLASH",
zeta = 2.13)
fitPL_wo15 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
data = Firm[-15,],
family = "SLASH",
zeta = 2.13)
fitPL_wo16 <- PLreg(firmcost ~ sizelog + indcost | sizelog + indcost,
data = Firm[-16,],
family = "SLASH",
zeta = 2.13)
coef.mu <- coef(fitPL)[1:3]
coef.mu_wo72 <- coef(fitPL_wo72)[1:3]
coef.mu_wo15 <- coef(fitPL_wo15)[1:3]
coef.mu_wo16 <- coef(fitPL_wo16)[1:3]
plot(Firm$indcost, Firm$firmcost,
pch = "+",
xlab = "indcost",
ylab = "firmcost")
#identify(Firm$indcost, Firm$firmcost)
covariate = matrix(c(rep.int(1, 1000),
rep(median(Firm$sizelog), 1000),
seq(0, 1.22, length.out = 1000)),
ncol = 3)
lines(covariate[,3],
as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu)),
type = "l")
lines(covariate[,3],
as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo72)),
type = "l", lty = 2, col = "blue")
lines(covariate[,3],
as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo15)),
type = "l", lty = 3, col = "red")
lines(covariate[,3],
as.vector(fitPL$link$median$linkinv(covariate%*%coef.mu_wo16)),
type = "l", lty = 4, col = "green")
parameters = c("pML",
"pML w/o 72",
"pML w/o 15",
"pML w/o 16")
legend(x = 0.5,
y = 0.8,
legend = parameters,
col = c("black", "blue", "red", "green"),
lty = c(1, 2, 3, 4),
cex = 0.6)