compContourM1/2u {modQR}R Documentation

Directional Regression Quantile Computation

Description

The functions compContourM1u and compContourM2u may be used to obtain not only directional regression quantiles for all directions, but also some related overall statistics. Their output may also be used for the evaluation of the corresponding regression quantile regions by means of evalContour. The functions use different methods and algorithms, namely compContourM1u is based on [01] and [06] and compContourM2u results from [03] and [07]. The corresponding regression quantile regions are nevertheless virtually the same. See all the references below for further details and possible applications.

Usage

compContourM1u(Tau = 0.2, YMat = NULL, XMat = NULL, CTechST = NULL)
compContourM2u(Tau = 0.2, YMat = NULL, XMat = NULL, CTechST = NULL)

Arguments

Tau

the quantile level in (0, 0.5).

YMat

the N x M response matrix with two to six columns, N > M+P-1. Each row corresponds to one observation.

XMat

the N x P design matrix including the (first) intercept column. The default NULL value corresponds to the unit vector of the right length. Each row corresponds to one observation.

CTechST

the (optional) list with some parameters influencing the computation and its output. Its default value can be generated by method-dependent getCTechSTM1/2u and then modified by the user before its use in compContourM1/2u.

Details

Generally, the performance of the functions deteriorates with increasing Tau, N, M, and P as for their reliability and time requirements. Nevertheless, they should work fine at least for two-dimensional problems up to N = 10000 and P = 10, for three-dimensional problems up to N = 500 and P = 5, and for four-dimensional problems up to N = 150 and P = 3.

Furthemore, common problems related to the computation can fortunately be prevented or overcome easily.

Bad data - the computation may fail if the processed data points are in a bad configuration (i.e., if they are not in general position or if they would lead to a quantile hyperplane with at least one zero coefficient), which mostly happens when discrete-valued/rounded/repeated observations, dummy variables or bad random number generators are employed. Such problems can often be prevented if one perturbs the data with a random noise of a reasonably small magnitude before the computation, splits the model into separate or independent submodels, cleverly uses affine equivariance, or replaces a few identical observations with a copy of them weighted by the total number of their occurrences.

Bad Tau - the computation may fail for a finite number of problematic quantile levels, e.g., if Tau is an integer multiple of 1/N in the location case with unit weights (when the sample quantiles are not uniquely defined). Such a situation may occur easily for Tau's with only a few decimal digits or in a fractional form, especially when the number of observations changes automatically during the computation. The problem can be fixed easily by perturbing Tau with a sufficiently small number in the right direction, which should not affect the resulting regression quantile contours although it may slightly change the other output. The strategy is also adopted by compContourM1/2u, but only in the location case and with a warning output message explaining it.

Bad scale - the computation may fail easily for badly scaled data. That is to say that the functionality has been heavily tested only for the observations coming from a centered unit hypercube. Nevertheless, you can always change the units of measurements or employ full affine equivariance to avoid all the troubles. Similar problems may also arise when properly scaled data are used with highly non-uniform weights, which frequently happens in local(ly) polynomial regression. Then the weights can be rescaled in a suitable way and the observations with virtually zero weights can be excluded from the computation.

Bad expectations - the computation and its output need not meet false expectations. Every user should be aware of the facts that the computation may take a long time or fail even for moderately sized three-dimensional data sets, that the HypMat component is not always present in the list COutST$CharST by default, and that the sample regression quantile contours can be not only empty, but also unbounded and crossing one another in the general regression case.

Bad interpretation - the output results may be easily interpreted misleadingly or erroneously. That is to say that the quantile level Tau is not linked to the probability content of the sample (regression) Tau-quantile region in any straightforward way. Furthermore, any meaningful parametric quantile regression model should include as regressors not only the variables influencing the trend, but also all those affecting the dispersion of the multivariate responses. Even then the cuts of the resulting regression quantile contours parallel to the response space cannot be safely interpreted as conditional multivariate quantiles except for some very special cases. Nevertheless, such a conclusion could somehow be warranted in case of nonparametric multiple-output quantile regression; see [09].

Value

Both compContourM1u and compContourM2u may display some auxiliary information regarding the computation on the screen (if CTechST$ReportI = 1) or store their in-depth output (determined by CTechST$BriefOutputI) in the output files (if CTechST$OutSaveI = 1) with the filenames beginning with the string contained in CTechST$OutFilePrefS, followed by the file number padded with zeros to form six digits and by the extension ‘.dqo’, respectively. The first output file produced by compContourM1u would thus be named ‘DQOutputM1_000001.dqo’.

Both compContourM1u and compContourM2u always return a list with the same components. Their interpretation is also the same (except for CharST that itself contains some components that are method-specific):

CharST

the list with some default or user-defined output. The default one is provided by function getCharSTM1u for compContourM1u and by function getCharSTM2u for compContourM2u. A user-defined function generating its own output can be employed instead by changing CTechST$getCharST.

CTechSTMsgS

the (possibly empty) string that informs about the problems with input CTechST.

ProbSizeMsgS

the (possibly empty) string that warns if the input problem is very large.

TauMsgS

the (possibly empty) string that announces an internal perturbation of Tau.

CompErrMsgS

the (possibly empty) string that decribes the error interrupting the computation.

NDQFiles

the counter of (possible) output files, i.e., as if CTechST$OutSaveI = 1.

NumB

the counter of (not necessarily distinct) optimal bases considered.

PosVec

the vector of length N that desribes the position of individual (regression) observations with respect to the exact (regression) Tau-quantile contour. The identification is reliable only after a successful computation. PosVec[i] = 0/1/2 if the i-th observation is in/on/out of the contour. If compContourM2u is used with CTechST$SkipRedI = 1, then PosVec correctly detects only all the outer observations.

MaxLWidth

the maximum width of one layer of the internal algorithm.

NIniNone

the number of trials when the initial solution could not be found at all.

NIniBad

the number of trials when the found initial solution did not have the right number of clearly nonzero coordinates.

NSkipCone

the number of skipped cones (where an interior point could not be found).

If CTechST.CubRegWiseI = 1, then the last four components are calculated over all the individual orthants.

References

[01] Hallin, M., Paindaveine, D. and Šiman, M. (2010) Multivariate quantiles and multiple-output regression quantiles: from L1 optimization to halfspace depth. Annals of Statistics 38, 635–669.

[02] Hallin, M., Paindaveine, D. and Šiman, M. (2010) Rejoinder (to [01]). Annals of Statistics 38, 694–703.

[03] Paindaveine, D. and Šiman, M. (2011) On directional multiple-output quantile regression. Journal of Multivariate Analysis 102, 193–212.

[04] Šiman, M. (2011) On exact computation of some statistics based on projection pursuit in a general regression context. Communications in Statistics - Simulation and Computation 40, 948–956.

[05] McKeague, I. W., López-Pintado, S., Hallin, M. and Šiman, M. (2011) Analyzing growth trajectories. Journal of Developmental Origins of Health and Disease 2, 322–329.

[06] Paindaveine, D. and Šiman, M. (2012) Computing multiple-output regression quantile regions. Computational Statistics & Data Analysis 56, 840–853.

[07] Paindaveine, D. and Šiman, M. (2012) Computing multiple-output regression quantile regions from projection quantiles. Computational Statistics 27, 29–49.

[08] Šiman, M. (2014) Precision index in the multivariate context. Communications in Statistics - Theory and Methods 43, 377–387.

[09] Hallin, M., Lu, Z., Paindaveine, D. and Šiman, M. (2015) Local bilinear multiple-output quantile/depth regression. Bernoulli 21, 1435–1466.

Examples

##computing all directional 0.15-quantiles of 199 random points
##uniformly distributed in the unit square centered at zero
##- preparing the input
Tau  <- 0.15
XMat <- matrix(1, 199, 1)
YMat <- matrix(runif(2*199, -0.5, 0.5), 199, 2)
##- Method 1:
COutST <- compContourM1u(Tau, YMat, XMat)
##- Method 2:
COutST <- compContourM2u(Tau, YMat, XMat)

[Package modQR version 0.1.3 Index]