compContourM1/2u {modQR} | R Documentation |
Directional Regression Quantile Computation
Description
The functions compContourM1u
and compContourM2u
may be used to obtain not only directional regression quantiles
for all directions, but also some related overall
statistics. Their output may also be used for the evaluation
of the corresponding regression quantile regions by means of
evalContour
. The functions use different
methods and algorithms, namely compContourM1u
is based
on [01] and [06] and compContourM2u
results from [03]
and [07]. The corresponding regression quantile regions are
nevertheless virtually the same. See all the references below
for further details and possible applications.
Usage
compContourM1u(Tau = 0.2, YMat = NULL, XMat = NULL, CTechST = NULL)
compContourM2u(Tau = 0.2, YMat = NULL, XMat = NULL, CTechST = NULL)
Arguments
Tau |
the quantile level in (0, 0.5). |
YMat |
the N x M response matrix with two to six
columns, |
XMat |
the N x P design matrix including the (first) intercept column. The default NULL value corresponds to the unit vector of the right length. Each row corresponds to one observation. |
CTechST |
the (optional) list with some parameters
influencing the computation and its output.
Its default value can be generated by
method-dependent |
Details
Generally, the performance of the functions deteriorates with increasing Tau, N, M, and P as for their reliability and time requirements. Nevertheless, they should work fine at least for two-dimensional problems up to N = 10000 and P = 10, for three-dimensional problems up to N = 500 and P = 5, and for four-dimensional problems up to N = 150 and P = 3.
Furthemore, common problems related to the computation can fortunately be prevented or overcome easily.
Bad data - the computation may fail if the processed data points are in a bad configuration (i.e., if they are not in general position or if they would lead to a quantile hyperplane with at least one zero coefficient), which mostly happens when discrete-valued/rounded/repeated observations, dummy variables or bad random number generators are employed. Such problems can often be prevented if one perturbs the data with a random noise of a reasonably small magnitude before the computation, splits the model into separate or independent submodels, cleverly uses affine equivariance, or replaces a few identical observations with a copy of them weighted by the total number of their occurrences.
Bad Tau - the computation may fail for a finite number of
problematic quantile levels, e.g., if Tau is an integer multiple
of 1/N in the location case with unit weights (when the sample
quantiles are not uniquely defined). Such a situation may occur
easily for Tau's with only a few decimal digits or in a
fractional form, especially when the number of observations
changes automatically during the computation. The problem
can be fixed easily by perturbing Tau with a sufficiently small
number in the right direction, which should not affect the
resulting regression quantile contours although it may slightly
change the other output. The strategy is also adopted
by compContourM1/2u
, but only in the location case and
with a warning output message explaining it.
Bad scale - the computation may fail easily for badly scaled data. That is to say that the functionality has been heavily tested only for the observations coming from a centered unit hypercube. Nevertheless, you can always change the units of measurements or employ full affine equivariance to avoid all the troubles. Similar problems may also arise when properly scaled data are used with highly non-uniform weights, which frequently happens in local(ly) polynomial regression. Then the weights can be rescaled in a suitable way and the observations with virtually zero weights can be excluded from the computation.
Bad expectations - the computation and its output need
not meet false expectations. Every user should be aware of
the facts that the computation may take a long time or fail
even for moderately sized three-dimensional data sets, that the
HypMat
component is not always present in the list
COutST$CharST
by default, and that the sample regression
quantile contours can be not only empty, but also unbounded and
crossing one another in the general regression case.
Bad interpretation - the output results may be easily interpreted misleadingly or erroneously. That is to say that the quantile level Tau is not linked to the probability content of the sample (regression) Tau-quantile region in any straightforward way. Furthermore, any meaningful parametric quantile regression model should include as regressors not only the variables influencing the trend, but also all those affecting the dispersion of the multivariate responses. Even then the cuts of the resulting regression quantile contours parallel to the response space cannot be safely interpreted as conditional multivariate quantiles except for some very special cases. Nevertheless, such a conclusion could somehow be warranted in case of nonparametric multiple-output quantile regression; see [09].
Value
Both compContourM1u and compContourM2u may display some
auxiliary information regarding the computation on the screen
(if CTechST$ReportI
= 1) or store their in-depth
output (determined by CTechST$BriefOutputI
) in the output
files (if CTechST$OutSaveI
= 1) with
the filenames beginning with the string contained in
CTechST$OutFilePrefS
, followed by the file number
padded with zeros to form six digits
and by the extension ‘.dqo’, respectively. The first
output file produced by compContourM1u
would
thus be named ‘DQOutputM1_000001.dqo’.
Both compContourM1u and compContourM2u always return a list with the same components. Their interpretation is also the same (except for CharST that itself contains some components that are method-specific):
CharST |
the list with some default or
user-defined output.
The default one is provided
by function |
CTechSTMsgS |
the (possibly empty) string that informs
about the problems with input |
ProbSizeMsgS |
the (possibly empty) string that warns if the input problem is very large. |
TauMsgS |
the (possibly empty) string that announces
an internal perturbation of |
CompErrMsgS |
the (possibly empty) string that decribes the error interrupting the computation. |
NDQFiles |
the counter of (possible) output files,
i.e., as if |
NumB |
the counter of (not necessarily distinct) optimal bases considered. |
PosVec |
the vector of length N that desribes
the position of individual (regression)
observations with respect to the
exact (regression) Tau-quantile
contour.
The identification is reliable only after a
successful computation.
|
MaxLWidth |
the maximum width of one layer of the internal algorithm. |
NIniNone |
the number of trials when the initial solution could not be found at all. |
NIniBad |
the number of trials when the found initial solution did not have the right number of clearly nonzero coordinates. |
NSkipCone |
the number of skipped cones (where an interior point could not be found). |
If CTechST.CubRegWiseI
= 1, then the last four
components are calculated over all the individual
orthants.
References
[01] Hallin, M., Paindaveine, D. and Šiman, M. (2010) Multivariate quantiles and multiple-output regression quantiles: from L1 optimization to halfspace depth. Annals of Statistics 38, 635–669.
[02] Hallin, M., Paindaveine, D. and Šiman, M. (2010) Rejoinder (to [01]). Annals of Statistics 38, 694–703.
[03] Paindaveine, D. and Šiman, M. (2011) On directional multiple-output quantile regression. Journal of Multivariate Analysis 102, 193–212.
[04] Šiman, M. (2011) On exact computation of some statistics based on projection pursuit in a general regression context. Communications in Statistics - Simulation and Computation 40, 948–956.
[05] McKeague, I. W., López-Pintado, S., Hallin, M. and Šiman, M. (2011) Analyzing growth trajectories. Journal of Developmental Origins of Health and Disease 2, 322–329.
[06] Paindaveine, D. and Šiman, M. (2012) Computing multiple-output regression quantile regions. Computational Statistics & Data Analysis 56, 840–853.
[07] Paindaveine, D. and Šiman, M. (2012) Computing multiple-output regression quantile regions from projection quantiles. Computational Statistics 27, 29–49.
[08] Šiman, M. (2014) Precision index in the multivariate context. Communications in Statistics - Theory and Methods 43, 377–387.
[09] Hallin, M., Lu, Z., Paindaveine, D. and Šiman, M. (2015) Local bilinear multiple-output quantile/depth regression. Bernoulli 21, 1435–1466.
Examples
##computing all directional 0.15-quantiles of 199 random points
##uniformly distributed in the unit square centered at zero
##- preparing the input
Tau <- 0.15
XMat <- matrix(1, 199, 1)
YMat <- matrix(runif(2*199, -0.5, 0.5), 199, 2)
##- Method 1:
COutST <- compContourM1u(Tau, YMat, XMat)
##- Method 2:
COutST <- compContourM2u(Tau, YMat, XMat)