R: Robust Linear Quantile Regression

lqr {lqr}

R Documentation

Robust Linear Quantile Regression

Description

It fits a robust linear quantile regression model using a new family of zero-quantile distributions for the error term. This family of distribution includes skewed versions of the Normal, Student's t, Laplace, Slash and Contaminated Normal distribution. It provides estimates and full inference. It also provides envelopes plots for assessing the fit and confidences bands when several quantiles are provided simultaneously.

Usage

lqr(formula,data = NULL,subset = NULL,
               p=0.5,dist = "normal",
               nu=NULL,gamma=NULL,
               precision = 10^-6,envelope=FALSE,
               CI=0.95,silent = FALSE
)

#lqr(y~x, data, p = 0.5, dist = "normal")
#lqr(y~x, data, p = 0.5, dist = "t")
#lqr(y~x, data, p = 0.5, dist = "laplace")
#lqr(y~x, data, p = 0.5, dist = "slash")
#lqr(y~x, data, p = 0.5, dist = "cont")

#lqr(y~x, p = c(0.25,0.50,0.75), dist = "normal")

Arguments

`formula`	an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.
`data`	an optional data frame, list or environment (or object coercible by `as.data.frame` to a data frame) containing the variables in the model. If not found in `data`, the variables are taken from `environment(formula)`.
`subset`	an optional string specifying a subset of observations to be used in the fitting process. Be aware of the use of double quotes in a proper way when necessary, e.g., in `"(sex=='F')"`.
`p`	An unique quantile or a set of quantiles related to the quantile regression.
`dist`	represents the distribution to be used for the error term. The values are `normal` for Normal distribution, `t` for Student's t distribution, `laplace` for Laplace distribution, `slash` for Slash distribution and `cont` for the Contaminated normal distribution.
`nu`	It represents the degrees of freedom when `dist = t`. For the Slash distribution (`dist = slash`) it is a shape parameter `\nu>0`. For the Contaminated Normal distribution, `\nu` is the parameter that represents the percentage of outliers. When is not provided, we use the MLE.
`gamma`	It represents a scale factor for the contaminated normal distribution. When is not provided, we use the MLE.
`precision`	The convergence maximum error permitted. By default is 10^-6.
`envelope`	if `TRUE`, it will show a confidence envelope for a curve based on bootstrap replicates. By default it is `TRUE` when just one quantile is provided. If a grid of quantiles is provided it will be `FALSE` by default.
`CI`	Confidence to be used for the Confidence Interval when a grid of quantiles is provided. Default = 0.95.
`silent`	if `FALSE` (by default), the function prints some output.

Details

When a grid of quantiles is provided, a graphical summary with point estimates and Confidence Intervals for model parameters is shown.

Value

`iter`	number of iterations.
`criteria`	attained criteria value.
`beta`	fixed effects estimates.
`sigma`	scale parameter estimate for the error term.
`nu`	Estimate of `nu` parameter detailed above.
`gamma`	Estimate of `gamma` parameter detailed above.
`SE`	Standard Error estimates.
`table`	Table containing the inference for the fixed effects parameters.
`loglik`	Log-likelihood value.
`AIC`	Akaike information criterion.
`BIC`	Bayesian information criterion.
`HQ`	Hannan-Quinn information criterion.
`fitted.values`	vector containing the fitted values.
`residuals`	vector containing the residuals.

Note

If a grid of quantiles is provided, the result will be a list of the same dimension where each element corresponds to each quantile as detailed above.

Author(s)

Christian E. Galarza <cgalarza88@gmail.com>, Luis Benites <lsanchez@ime.usp.br> and Victor H. Lachos <hlachos@ime.unicamp.br>

Maintainer: Christian E. Galarza <cgalarza88@gmail.com>

References

Galarza, C., Lachos, V. H., Cabral, C. R. B., & Castro, C. L. (2017). Robust quantile regression using a generalized class of skewed distributions. Stat, 6(1), 113-130.

Wichitaksorn, N., Choy, S. T., & Gerlach, R. (2014). A generalized class of skew distributions and associated robust quantile regression models. Canadian Journal of Statistics, 42(4), 579-596.

Examples


#Example 1

##Load the data
data(ais)
attach(ais)

## Fitting a median regression with Normal errors (by default)

modelF = lqr(BMI~LBM,data = ais,subset = "(Sex==1)")
modelM = lqr(BMI~LBM,data = ais,subset = "(Sex==0)")

plot(LBM,BMI,col=Sex*2+1,
     xlab="Lean Body Mass",
     ylab="Body4 Mass Index",
     main="Quantile Regression")

abline(a = modelF$beta[1],b = modelF$beta[2],lwd=2,col=3)
abline(a = modelM$beta[1],b = modelM$beta[2],lwd=2,col=1)
legend(x = "topleft",legend = c("Male","Female"),lwd = 2,col = c(1,3))

#COMPARING SOME MODELS for median regression
modelN  = lqr(BMI~LBM,dist = "normal")
modelT  = lqr(BMI~LBM,dist = "t")
modelL  = lqr(BMI~LBM,dist = "laplace")

#Comparing AIC criteria
modelN$AIC;modelT$AIC;modelL$AIC

#This could be automatically done using best.lqr()
best.model = best.lqr(BMI~LBM,data = ais,
                      p = 0.75, #third quartile
                      criterion = "AIC")

#Let's use a grid of quantiles (no output)
modelfull = lqr(BMI~LBM,data = ais,
                p = seq(from = 0.10,to = 0.90,by = 0.05),
                dist = "normal",silent = TRUE)

#Plotting quantiles 0.10,0.25,0.50,0.75 and 0.90

if(TRUE){
  plot(LBM,BMI,xlab = "Lean Body Mass"
       ,ylab = "Body Mass Index", main = "Quantile Regression",pch=16)
  
  colvec = c(2,2,3,3,4)
  imodel = c(1,17,4,14,9)
  for(i in 1:5){
    abline(a = modelfull[[imodel[i]]]$beta[1],
           b = modelfull[[imodel[i]]]$beta[2],
           lwd=2,col=colvec[i])  
  }
  legend(x = "topleft",
         legend = rev(c("0.10","0.25","0.50","0.75","0.90")),
         lwd = 2,col = c(2,3,4,3,2))
}

#Example 2
##Load the data

data(crabs,package = "MASS")
attach(crabs)

## Fitting a median regression with Normal errors (by default) #Note the double quotes
crabsF = lqr(BD~FL,data = crabs,subset = "(sex=='F')")
crabsM = lqr(BD~FL,data = crabs,subset = "(sex=='M')")

if(TRUE){
  plot(FL,BD,col=as.numeric(sex)+1,
       xlab="Frontal lobe size",ylab="Body depth",main="Quantile Regression")
  abline(a = crabsF$beta[1],b = crabsF$beta[2],lwd=2,col=2)
  abline(a = crabsM$beta[1],b = crabsM$beta[2],lwd=2,col=3)
  legend(x = "topleft",legend = c("Male","Female"),
         lwd = 2,col = c(3,2))
}

#Median regression for different distributions

modelN  = lqr(BD~FL,dist = "normal")
modelT  = lqr(BD~FL,dist = "t")
modelL  = lqr(BD~FL,dist = "laplace")
modelS  = lqr(BD~FL,dist = "slash")
modelC  = lqr(BD~FL,dist = "cont" )

#Comparing AIC criterias
modelN$AIC;modelT$AIC;modelL$AIC;modelS$AIC;modelC$AIC

# best model based on BIC
best.lqr(BD~FL,criterion = "BIC")

#Let's use a grid of quantiles for the Student's t distribution
modelfull = lqr(BD~FL,data = crabs,
                p = seq(from = 0.10,to = 0.90,by = 0.05),
                dist = "t") # silent = FALSE

#Plotting quantiles 0.10,0.25,0.50,0.75 and 0.90
if(TRUE){
  plot(FL,BD,xlab = "Frontal lobe size"
       ,ylab = "Body depth", main = "Quantile Regression",pch=16)
  colvec = c(2,2,3,3,4)
  imodel = c(1,17,4,14,9)
  for(i in 1:5){
    abline(a = modelfull[[imodel[i]]]$beta[1],
           b = modelfull[[imodel[i]]]$beta[2],
           lwd=2,col=colvec[i])  
  }
  legend(x = "topleft",
         legend = rev(c("0.10","0.25","0.50","0.75","0.90")),
         lwd = 2,col = c(2,3,4,3,2))
}

[Package lqr version 5.2 Index]