R: Robust Logistic Linear Quantile Regression

Log.lqr {lqr}

R Documentation

Robust Logistic Linear Quantile Regression

Description

It performs the logistic transformation in Galarza et.al.(2020) (see references) for estimating quantiles for a bounded response. Once the response is transformed, it uses the lqr function.

Usage

Log.lqr(formula,data = NULL,subset = NULL,
                   p=0.5,a=0,b=1,
                   dist = "normal",
                   nu=NULL,
                   gamma=NULL,
                   precision = 10^-6,
                   epsilon = 0.001,
                   CI=0.95,
                   silent = FALSE)

Arguments

We will detail first the only three arguments that differ from lqr function.

`a`	lower bound for the response (default = 0)
`b`	upper bound for the response (default = 1)
`epsilon`	a small quantity `\epsilon>0` that ensures that the logistic transform is defined for all values of the response.
`formula`	an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.
`data`	an optional data frame, list or environment (or object coercible by `as.data.frame` to a data frame) containing the variables in the model. If not found in `data`, the variables are taken from `environment(formula)`.
`subset`	an optional string specifying a subset of observations to be used in the fitting process. Be aware of the use of double quotes in a proper way when necessary, e.g., in `"(sex=='F')"`.
`p`	An unique quantile or a set of quantiles related to the quantile regression.
`dist`	represents the distribution to be used for the error term. The values are `normal` for Normal distribution, `t` for Student's t distribution, `laplace` for Laplace distribution, `slash` for Slash distribution and `cont` for the Contaminated normal distribution.
`nu`	It represents the degrees of freedom when `dist = t`. For the Slash distribution (`dist = slash`) it is a shape parameter `\nu>0`. For the Contaminated Normal distribution, `\nu` is the parameter that represents the percentage of outliers. When is not provided, we use the MLE.
`gamma`	It represents a scale factor for the contaminated normal distribution. When is not provided, we use the MLE.
`precision`	The convergence maximum error permitted. By default is 10^-6.
`CI`	Confidence to be used for the Confidence Interval when a grid of quantiles is provided. Default = 0.95.
`silent`	if `FALSE` (by default), the function prints some output.

Details

We follow the transformation in Bottai et.al. (2009) defined as

h(y)=logit(y)=log(\frac{y-a}{b-y})

that implies

Q_{y}(p)=\frac{b\,exp(X\beta) + a}{1 + exp(X\beta)}

where Q_{y}(p) represents the conditional quantile of the response. Once estimates for the regression coefficients \beta_p are obtained, inference on Q_{y}(p) can then be made through the inverse transform above. This equation (as function) is provided in the output. See example.

The interpretation of the regression coefficients is analogous to the interpretation of the coefficients of a logistic regression for binary outcomes.

For example, let x_1 be the gender (male = 0, female=1). Then exp(\beta_{0.5,1}) represents the odds ratio of median score in males vs females, where the odds are defined using the score instead of a probability, (y-a)/(b-y). When the covariate is continous, the respective \beta coeficient can be interpretated as the increment (or decrement) over the log(odd ratio) when the covariate increases one unit.

Value

`iter`	number of iterations.
`criteria`	attained criteria value.
`beta`	fixed effects estimates.
`sigma`	scale parameter estimate for the error term.
`nu`	Estimate of `nu` parameter detailed above.
`gamma`	Estimate of `gamma` parameter detailed above.
`SE`	Standard Error estimates.
`table`	Table containing the inference for the fixed effects parameters.
`loglik`	Log-likelihood value.
`AIC`	Akaike information criterion.
`BIC`	Bayesian information criterion.
`HQ`	Hannan-Quinn information criterion.
`fitted.values`	vector containing the fitted values.
`residuals`	vector containing the residuals.

Note

When a grid of quantiles is provided, a graphical summary with point estimates and Confidence Intervals for model parameters is shown. Also, the result will be a list of the same dimension where each element corresponds to each quantile as detailed above.

Author(s)

Christian E. Galarza <cgalarza88@gmail.com>, Luis Benites <lsanchez@ime.usp.br> and Victor H. Lachos <hlachos@ime.unicamp.br>

Maintainer: Christian E. Galarza <cgalarza88@gmail.com>

References

Galarza, C.M., Zhang P. and Lachos, V.H. (2020). Logistic Quantile Regression for Bounded Outcomes Using a Family of Heavy-Tailed Distributions. Sankhya B: The Indian Journal of Statistics. doi:10.1007/s13571-020-00231-0

Galarza, C., Lachos, V. H., Cabral, C. R. B., & Castro, C. L. (2017). Robust quantile regression using a generalized class of skewed distributions. Stat, 6(1), 113-130.

Examples



##Load the data
data(resistance)
attach(resistance)

#EXAMPLE 1.1

#Comparing the resistence to death of two types of tumor-cells.
#The response is a score in [0,4].

boxplot(score~type,ylab="score",xlab="type")

#Student't median logistic quantile regression
res = Log.lqr(score~type,data = resistance,a=0,b=4,dist="t")

# The odds ratio of median score in type B vs type A
exp(res$beta[2])

#Proving that exp(res$beta[2])  is approx median odd ratio
medA  = median(score[type=="A"])
medB  = median(score[type=="B"])
rateA = (medA - 0)/(4 - medA)
rateB = (medB - 0)/(4 - medB)
odd   = rateB/rateA

round(c(exp(res$beta[2]),odd),3)


#EXAMPLE 1.2
############

#Comparing the resistence to death depending of dose.

#descriptive
plot(dose,score,ylim=c(0,4),col="dark gray");abline(h=c(0,4),lty=2)
dosecat<-cut(dose, 6, ordered = TRUE)
boxplot(score~dosecat,ylim=c(0,4))
abline(h=c(0,4),lty=2)

#(Non logistic) Best quantile regression for quantiles
# 0.05, 0.50 and 0.95
p05 = best.lqr(score~poly(dose,3),data = resistance,p = 0.05)
p50 = best.lqr(score~poly(dose,3),data = resistance,p = 0.50)
p95 = best.lqr(score~poly(dose,3),data = resistance,p = 0.95)
res3  = list(p05,p50,p95)

plot(dose,score,ylim=c(-1,5),col="gray");abline(h=c(0,4),lty=2)
lines(sort(dose), p05$fitted.values[order(dose)], col='red', type='l')
lines(sort(dose), p50$fitted.values[order(dose)], col='blue', type='l')
lines(sort(dose), p95$fitted.values[order(dose)], col='red', type='l')

#Using Student's t logistic quantile regression for obtaining preditypeBions inside bounds

logp05 = Log.lqr(score~poly(dose,3),data = resistance,p = 0.05,b = 4,dist = "t") #a = 0 by default
logp50 = Log.lqr(score~poly(dose,3),data = resistance,p = 0.50,b = 4,dist = "t")
logp95 = Log.lqr(score~poly(dose,3),data = resistance,p = 0.95,b = 4,dist = "t")
res4  = list(logp05,logp50,logp95)

#No more predited curves out-of-bounds
plot(dose,score,ylim=c(-1,5),col="gray");abline(h=c(0,4),lty=2)
lines(sort(dose), logp05$fitted.values[order(dose)], col='red', type='l')
lines(sort(dose), logp50$fitted.values[order(dose)], col='blue', type='l')
lines(sort(dose), logp95$fitted.values[order(dose)], col='red', type='l')

#EXAMPLE 1.3
############

#A full model using dose and type for a grid of quantiles

res5 = Log.lqr(formula = score ~ poly(dose,3)*type,data = resistance,
               a = 0,b = 4,
               p = seq(from = 0.05,to = 0.95,by = 0.05),dist = "t",
               silent = TRUE)

#A nice plot

if(TRUE){
  par(mfrow=c(1,2))
  typeB = (resistance$type == "B")
  
  plot(dose,score,
       ylim=c(0,4),
       col=c(8*typeB + 1*!typeB),main="Type A")
  abline(h=c(0,4),lty=2)
  
  lines(sort(dose[!typeB]),
        res5[[2]]$fitted.values[!typeB][order(dose[!typeB])],
        col='red')
  
  lines(sort(dose[!typeB]),
        res5[[5]]$fitted.values[!typeB][order(dose[!typeB])],
        col='green')
  
  lines(sort(dose[!typeB]),
        res5[[10]]$fitted.values[!typeB][order(dose[!typeB])],
        col='blue',lwd=2)
  
  lines(sort(dose[!typeB]),
        res5[[15]]$fitted.values[!typeB][order(dose[!typeB])],
        col='green')
  
  lines(sort(dose[!typeB]),
        res5[[18]]$fitted.values[!typeB][order(dose[!typeB])],
        col='red')
  
  plot(dose,score,
       ylim=c(0,4),
       col=c(1*typeB + 8*!typeB),main="Type B")
  abline(h=c(0,4),lty=2)
  
  lines(sort(dose[typeB]),
        res5[[2]]$fitted.values[typeB][order(dose[typeB])],
        col='red')
  
  lines(sort(dose[typeB]),
        res5[[5]]$fitted.values[typeB][order(dose[typeB])],
        col='green')
  
  lines(sort(dose[typeB]),
        res5[[10]]$fitted.values[typeB][order(dose[typeB])],
        col='blue',lwd=2)
  
  lines(sort(dose[typeB]),
        res5[[15]]$fitted.values[typeB][order(dose[typeB])],
        col='green')
  
  lines(sort(dose[typeB]),
        res5[[18]]$fitted.values[typeB][order(dose[typeB])],
        col='red')
}

[Package lqr version 5.2 Index]