Qest {Qest} | R Documentation |
Q-Estimation
Description
An implementation of the quantile-based estimators described in Sottile and Frumento (2022).
Usage
Qest(formula, Q, weights, start, data, ntau = 199, wtau = NULL,
control = Qest.control(), ...)
Arguments
formula |
a two-sided formula of the form |
Q |
a parametric quantile function of the form |
weights |
an optional vector of weights to be used in the fitting process. The weights will always be normalized to sum to the sample size. This implies that, for example, using double weights will not halve the standard errors. |
start |
a vector of starting values. |
data |
an optional data frame, list or environment (or object coercible by |
ntau |
the number of points for numerical integration (see “Details”). Default |
wtau |
an optional function that assigns a different weight to each quantile. By default, all quantiles in (0,1) have the same weight. Please check the documentation of |
control |
a list of operational parameters. This is usually passed through |
... |
additional arguments for |
Details
A parametric model, Q(\tau | \theta, x)
, is used to describe the conditional quantile function of an outcome Y
, given a vector x
of covariates. The model parameters, \theta
, are estimated by minimizing the (weighted) integral, with respect to \tau
, of the loss function of standard quantile regression. If the data are censored or truncated, \theta
is estimated by solving a set of estimating equations. In either case, numerical integration is required to calculate the objective function: a grid of ntau
points in (0,1)
is used. The estimation algorithm is briefly described in the documentation of Qest.control
.
The optional argument wtau
can be used to attribute a different weight to each quantile. Although it is possible to choose wtau
to be a discontinuous function (e.g., wtau = function(tau){tau < 0.95}
), this may occasionally result in poorly estimated standard errors.
The quantile function Q
must have at least the following three arguments: theta, tau, data
, in this order. The first argument, theta
, is a vector (not a matrix) of parameters' values. The second argument, tau
, is the order of the quantile. When Q
receives a n*ntau
matrix of tau
values, it must return a n*ntau
matrix of quantiles. The third argument, data
, is a data frame that includes the predictors used by Q
.
If Q
is identified by one Qfamily
, everything becomes much simpler. It is not necessary to implement your own quantile function, and the starting points are not required. Note that ntau
is ignored if Q = Qnorm
or Q = Qunif
.
Please check the documentation of Qfamily
to see the available built-in distributions. A convenient Q-based implementation of the standard linear regression model is offered by Qlm
. Proportional hazards models are implemented in Qcoxph
.
Value
a list with the following elements:
coefficients |
a named vector of coefficients. |
std.errs |
a named vector of estimated standard errors. |
covar |
the estimated covariance matrix of the estimators. |
obj.function |
the value of the minimized loss function. If the data are censored or truncated, a meaningful loss function which, however, is not the function being minimized (see “Note”). |
ee |
the values of the estimating equations at the solution. If the data are neither censored nor truncated, the partial derivatives of the loss function. |
jacobian |
the jacobian at the solution. If the data are neither censored nor truncated, the matrix of second derivatives of the loss function. |
CDF , PDF |
the fitted values of the cumulative distribution function (CDF) and the probability density function (PDF). |
converged |
logical. The convergence status. |
n.it |
the number of iterations. |
internal |
internal elements. |
call |
the matched call. |
Note
NOTE 1. If the data are censored or truncated, estimation is carried out by solving estimating equations, and no associated loss is present. In this case, a meaningful value of obj.function
is the integrated loss [equation 1 of Sottile and Frumento (2022)] in which the indicator function I(y \le Q(\tau | \theta, x))
has been replaced with one of the expressions presented in equations 6 and 7 of the paper. The resulting loss, however, is not the function being minimized.
NOTE 2. To prevent computational problems, avoid situations in which some of the estimated parameters are expected to be very small or very large. For example, standardize the predictors, and normalize the response. Avoid as much as possible parameters with bounded support. For example, model a variance/rate/shape parameter on the log scale, e.g., \sigma = exp(\theta)
. Carefully select the starting points, and make sure that Q(start, ...)
is well-defined. If Q
is identified by one Qfamily
, all these recommendations can be ignored.
NOTE 3. You should not use Qest
to fit parametric models describing discrete distributions, where the quantile function is piecewise constant. You can try, but the optimization algorithm will most likely fail. The predefined family Qpois
allows to fit a Poisson distribution by using a continuous version of its quantile function (see Qfamily
).
Author(s)
Paolo Frumento <paolo.frumento@unipi.it>, Gianluca Sottile <gianluca.sottile@unipa.it>
References
Sottile G, and Frumento P (2022). Robust estimation and regression with parametric quantile functions. Computational Statistics and Data Analysis. <doi:10.1016/j.csda.2022.107471>
See Also
Qest.control
, for operational parameters, and summary.Qest
, for model summary. Qfamily
, for the available built-in distributions. wtrunc
for built-in weighting functions (wtau
argument). Qlm
, for Q-estimation of the standard normal (linear) regression model; Qcoxph
, for proportional hazards models.
Examples
# Ex1. Normal model
# Quantile function of a linear model
Qlinmod <- function(theta, tau, data){
sigma <- exp(theta[1])
beta <- theta[-1]
X <- model.matrix( ~ x1 + x2, data = data)
qnorm(tau, X %*% beta, sigma)
}
n <- 100
x1 <- rnorm(n)
x2 <- runif(n,0,3)
theta <- c(1,4,1,2)
y <- Qlinmod(theta, runif(n), data.frame(x1,x2)) # generate the data
m1 <- Qest(y ~ x1 + x2, Q = Qlinmod, start = c(NA,NA,NA,NA)) # User-defined quantile function
summary(m1)
m2 <- Qest(y ~ x1 + x2, Q = Qnorm) # Qfamily
summary(m2)
m3 <- Qlm(y ~ x1 + x2)
summary(m3) # using 'Qlm' is much simpler and faster, with identical results
# Ex2. Weibull model with proportional hazards
# Quantile function
QWeibPH <- function(theta, tau, data){
shape <- exp(theta[1])
beta <- theta[-1]
X <- model.matrix(~ x1 + x2, data = data)
qweibull(tau, shape = shape, scale = (1/exp(X %*% beta))^(1/shape))
}
n <- 100
x1 <- rbinom(n,1,0.5)
x2 <- runif(n,0,3)
theta <- c(2,-0.5,1,1)
t <- QWeibPH(theta, runif(n), data.frame(x1,x2)) # time-to-event
c <- runif(n,0.5,1.5) # censoring variable
y <- pmin(t,c) # observed response
d <- (t <= c) # event indicator
m1 <- Qest(Surv(y,d) ~ x1 + x2, Q = QWeibPH, start = c(NA,NA,NA,NA))
summary(m1)
m2 <- Qcoxph(Surv(y,d) ~ x1 + x2)
summary(m2) # using 'Qcoxph' is much simpler and faster (but not identical)
# Ex3. A Gamma model
# Quantile function
Qgm <- function(theta, tau, data){
a <- exp(theta[1])
b <- exp(theta[2])
qgamma(tau, shape = a, scale = b)
}
n <- 100
theta <- c(2,-1)
y <- rgamma(n, shape = exp(theta[1]), scale = exp(theta[2]))
m1 <- Qest(y ~ 1, Q = Qgm, start = c(NA, NA)) # User-defined quantile function
m2 <- Qest(y ~ 1, Q = Qgamma) # Qfamily
m3 <- Qest(y ~ 1, Q = Qgamma, wtau = function(tau, h) dnorm((tau - 0.5)/h), h = 0.2)
# In m3, more weight is assigned to quantiles around the median
# Ex4. A Poisson model
# Quantile function
n <- 100
x1 <- runif(n)
x2 <- rbinom(n,1,0.5)
y <- rpois(n, exp(1.5 -0.5*x1 + x2))
m1 <- Qest(y ~ x1 + x2, Q = Qpois) # Use a Qfamily! See "Note"
m2 <- Qest(y + runif(n) ~ x1 + x2, Q = Qpois) # Use jittering! See the documentation of "Qfamily"