| cqcheck {qgam} | R Documentation |
Visually checking a fitted quantile model
Description
Given an additive quantile model, fitted using qgam, cqcheck provides some plots
that allow to check what proportion of responses, y, falls below the fitted quantile.
Usage
cqcheck(
obj,
v,
X = NULL,
y = NULL,
nbin = c(10, 10),
bound = NULL,
lev = 0.05,
scatter = FALSE,
...
)
Arguments
obj |
the output of a |
v |
if a 1D plot is required, |
X |
a dataframe containing the data used to obtain the conditional quantiles. By default it is NULL, in which
case predictions are made using the model matrix in |
y |
vector of responses. Its i-th entry corresponds to the i-th row of X. By default it is NULL, in which
case it is internally set to |
nbin |
a vector of integers of length one (1D case) or two (2D case) indicating the number of bins to be used
in each direction. Used only if |
bound |
in the 1D case it is a numeric vector whose increasing entries represent the bounds of each bin.
In the 2D case a list of two vectors should be provided. |
lev |
the significance levels used in the plots, this determines the width of the confidence intervals. Default is 0.05. |
scatter |
if TRUE a scatterplot is added (using the |
... |
extra graphical parameters to be passed to |
Details
Having fitted an additive model for, say, quantile qu=0.4 one would expect that about 40
responses fall below the fitted quantile. This function allows to visually compare the empirical number
of responses (qu_hat) falling below the fit with its theoretical value (qu). In particular,
the responses are binned, which the bins being constructed along one or two variables (given be arguments
v). Let (qu_hat[i]) be the proportion of responses below the fitted quantile in the ith bin.
This should be approximately equal to qu, for every i. In the 1D case, when v is a single
character or a numeric vector, cqcheck provides a plot where: the horizontal line is qu,
the dots correspond to qu_hat[i] and the grey lines are confidence intervals for qu. The
confidence intervals are based on qbinom(lev/2, siz, qu), if the dots fall outside them, then
qu_hat[i] might be deviating too much from qu. In the 2D case, when v is a vector of two
characters or a matrix with two columns, we plot a grid of bins. The responses are divided between the bins
as before, but now don't plot the confidence intervals. Instead we report the empirical proportions qu_hat[i]
for the non-empty bin, and with colour the bins in red if qu_hat[i]<qu and in green otherwise. If
qu_hat[i] falls outside the confidence intervals we put an * next to the numeric qu_hat[i] and
we use more intense colours.
Value
Simply produces a plot.
Author(s)
Matteo Fasiolo <matteo.fasiolo@gmail.com>.
Examples
#######
# Bivariate additive model y~1+x+x^2+z+x*z/2+e, e~N(0, 1)
#######
## Not run:
library(qgam)
set.seed(15560)
n <- 500
x <- rnorm(n, 0, 1); z <- rnorm(n)
X <- cbind(1, x, x^2, z, x*z)
beta <- c(0, 1, 1, 1, 0.5)
y <- drop(X %*% beta) + rnorm(n)
dataf <- data.frame(cbind(y, x, z))
names(dataf) <- c("y", "x", "z")
#### Fit a constant model for median
qu <- 0.5
fit <- qgam(y~1, qu = qu, data = dataf)
# Look at what happens along x: clearly there is non linear pattern here
cqcheck(obj = fit, v = c("x"), X = dataf, y = y)
#### Add a smooth for x
fit <- qgam(y~s(x), qu = qu, data = dataf)
cqcheck(obj = fit, v = c("x"), X = dataf, y = y) # Better!
# Lets look across x and z. As we move along z (x2 in the plot)
# the colour changes from green to red
cqcheck(obj = fit, v = c("x", "z"), X = dataf, y = y, nbin = c(5, 5))
# The effect look pretty linear
cqcheck(obj = fit, v = c("z"), X = dataf, y = y, nbin = c(10))
#### Lets add a linear effect for z
fit <- qgam(y~s(x)+z, qu = qu, data = dataf)
# Looks better!
cqcheck(obj = fit, v = c("z"))
# Lets look across x and y again: green prevails on the top-left to bottom-right
# diagonal, while the other diagonal is mainly red.
cqcheck(obj = fit, v = c("x", "z"), nbin = c(5, 5))
### Maybe adding an interaction would help?
fit <- qgam(y~s(x)+z+I(x*z), qu = qu, data = dataf)
# It does! The real model is: y ~ 1 + x + x^2 + z + x*z/2 + e, e ~ N(0, 1)
cqcheck(obj = fit, v = c("x", "z"), nbin = c(5, 5))
## End(Not run)