R: Robust_Mahalanobis

Robust_Mahalanobis_regression {RobRegression}

R Documentation

Robust_Mahalanobis_regression

Description

We propose here a function which enables to provide a robust estimation of the parameters of Multivariate Gaussian Linear Models of the form Y = X \beta + \epsilon where \epsilon is a 0-mean Gaussian vector of variance \Sigma. In addition, one can aslo consider a low-rank variance of the form \Sigma = C + \sigma I where \sigma is a positive scalar and C is a matrix of rank d. More precisely, the aim is to minimize the functional

G_\lambda(\hat{\beta}) = \mathbb{E}\left(\| Y-X\hat{\beta} \|_{\Sigma^{-1}}\right) + \lambda \|\hat{\beta}\|^{\text{Ridge}}.

Usage

Robust_Mahalanobis_regression(X, Y, alphaRM=0.66, alphareg=0.66, w=2, lambda=0,
                              creg='default', K=2:30, par=TRUE, epsilon=10^(-8),
                              method_regression='Offline', niter_regression=50,
                              cRM='default', mc_sample_size='default',
                              method_MCM='Weiszfeld', methodMC='Robbins',
                              niterMC=50, ridge=1, eps_vp=10^(-4), nlambda=50,
                              scale='none', tol=10^(-3))

Arguments

`X`	A `(n,p)`-matrix whose rows are the explaining data.
`Y`	A `(n,q)`-matrix whose rows are the variables to be explained.
`method_regression`	The method used for estimating the parameter. Should be `method_regression='Offline'` if the fix point algorithm is used, and `method_regression='Online'` if the (weighted) averaged stochastic gradient algorithm is used. Default is `'Offline'`.
`niter_regression`	The maximum number of regression iterations if the fix point algorithm is used, i.e. if `method_regression='Offline'`.
`epsilon`	Stoping condition for the fix point algorithm if `method_regression='Offline'`.
`scale`	If a scaling is used. `scale='robust'` should be used if a robust scaling of `Y` is desired. Default is `'none'`.
`ridge`	The power of the penalty: i.e. should be `2` if the squared norm is considered or `1` if the norm is considered.
`lambda`	A vector giving the different studied penalizations. If `lambda='default'`, would be a vector of preselected penalizations.
`par`	Is equal to `T` if the parallelization of the algorithm for estimating robustly the variance of the noise is allowed.
`nlambda`	The number of tested penalizations if `lambda='default'`.
`alphaRM`	A scalar between 1/2 and 1 used in the stepsequence if the Robbins-Monro algorithm is used, i.e. if `methodMC='Robbins'`. Default is `0.66`.
`alphareg`	A scalar between 1/2 and 1 used in the stepsequence for stochastic gradient algorithm if `method_regression='Online'`. Default is `0.66`.
`w`	The power for the weighted averaged algorithms if `method_regression='Online'` or if `methodMC='Robbins'`.
`creg`	The constant in the stepsequence if the averaged stochastic gradient algorithm is used, i.e. if `method='Online'`.
`K`	A vector containing the possible values of `d`. The good `d` is chosen with the help of a penatly criterion if the length of `K` is larger than 10. Default is `ncol(X)`.
`mc_sample_size`	The number of data generated for the Monte-Carlo method for estimating robustly the eigenvalues of the variance.
`method_MCM`	The method chosen to estimate Median Covariation Matrix. Can be `'Weiszfeld'` if the Weiszfeld algorithm is used, or `'ASGD'` if one chooses the Averaged Stochastic Gradient Descent algorithm.
`methodMC`	The method chosen to estimate robustly the variance. Can be `'Robbins'`, `'Grad'` or `'Fix'`.
`niterMC`	The number of iterations for estimating robustly the variance of each class if `methodMC='Fix'` or `methodMC='Grad'`.
`eps_vp`	The minimum values for the estimates of the eigenvalues of the Variance can take. Default is `10^-4`.
`cRM`	The constant in the stepsequence if the Robbins-Monro algorithm is used to robustly estimate the variance, i.e. if `methodMC='Robbins'`.
`tol`	A scalar that avoid numerical problems if method='Offline'. Default is `10^(-3)`.

Value

A list with:

`beta`	A `(p,q)`-matrix giving the estimation of the parameters of the MultivariateGaussian Linear Regression.
`Residual_Variance`	A `(q,q)`-matrix giving the estimation of the variance of the residuals.
`criterion`	A vector giving the loss for the different chosen `lambda`. If `scale='robust'`, it is calculated on the scaled data.
`all_beta`	A list containing the different estimation of the parameters (with respect to the different choices of `lambda`).
`lambda_opt`	A scalar giving the selected `lambda`.
`variance_results`	A list giving the results on the variance of the noise obtained with the help of the function `Robust_Variance`. If `scale='robust'`, it is calculated on the scaled data. The details are given above.

Details of the list variance_results:

`Sigma`	The robust estimation of the variance.
`invSigma`	The robuste estimation of the inverse of the variance.
`MCM`	The Median Covariation Matrix.
`eigenvalues`	A vector containing the estimation of the `d+1` main eigenvalues of the variance, where `d+1` is the optimal choice belonging to `K`.
`MCM_eigenvalues`	A vector containing the estimation of the `d+1` main eigenvalues of the Median Covariation Matrix, where `d+1` is the optimal choice belonging to `K`.
`cap`	The result given for capushe for selecting `d` if the length of `K` is larger than 10.
`reduction_results`	A list containing the results for all possible `K`.

References

Cardot, H., Cenac, P. and Zitt, P-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19, 18-43.

Cardot, H. and Godichon-Baggioni, A. (2017). Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis. Test, 26(3), 461-480

Vardi, Y. and Zhang, C.-H. (2000). The multivariate L1-median and associated data depth. Proc. Natl. Acad. Sci. USA, 97(4):1423-1426.

Examples


p=5
q=10
n=2000
mu=rep(0,q)
Sigma=diag(c(q,rep(0.1,q-1)))
epsilon=mvtnorm::rmvnorm(n = n,mean = mu,sigma = Sigma)
X=mvtnorm::rmvnorm(n=n,mean=rep(0,p))
beta=matrix(rnorm(p*q),ncol=q)
Y=X %*% beta+epsilon
Res_reg=Robust_Mahalanobis_regression(X,Y,par=FALSE)
sum((Res_reg$beta-beta)^2)

[Package RobRegression version 0.1.0 Index]