R: Undertakes a Lorenz regression

Lorenz.Reg {LorenzRegression}

R Documentation

Undertakes a Lorenz regression

Description

Lorenz.Reg performs the Lorenz regression of a response with respect to several covariates.

Usage

Lorenz.Reg(
  formula,
  data,
  standardize = TRUE,
  weights = NULL,
  parallel = FALSE,
  penalty = c("none", "SCAD", "LASSO"),
  h.grid = c(0.1, 0.2, 1, 2, 5) * nrow(data)^(-1/5.5),
  eps = 0.005,
  sel.choice = c("BIC", "CV", "Boot")[1],
  nfolds = 10,
  seed.CV = NULL,
  foldID = NULL,
  Boot.inference = FALSE,
  B = 500,
  bootID = NULL,
  seed.boot = NULL,
  LR = NULL,
  LR.boot = NULL,
  ...
)

Arguments

`formula`	A formula object of the form response ~ other_variables.
`data`	A data frame containing the variables displayed in the formula.
`standardize`	Should the variables be standardized before the estimation process? Default value is TRUE.
`weights`	vector of sample weights. By default, each observation is given the same weight.
`parallel`	Whether parallel computing should be used to distribute the computations on different CPUs. Either a logical value determining whether parallel computing is used (TRUE) or not (FALSE, the default value). Or a numerical value determining the number of cores to use.
`penalty`	should the regression include a penalty on the coefficients size. If "none" is chosen, a non-penalized Lorenz regression is computed using function `Lorenz.GA`. If "SCAD" is chosen, a penalized Lorenz regression with SCAD penalty is computed using function `Lorenz.SCADFABS`. IF "LASSO" is chosen, a penalized Lorenz regression with LASSO penalty is computed using function `Lorenz.FABS`.
`h.grid`	Only used if penalty="SCAD" or penalty="LASSO". Grid of values for the bandwidth of the kernel, determining the smoothness of the approximation of the indicator function. Default value is (0.1,0.2,1,2,5)*n^(-1/5.5), where n is sample size.
`eps`	Only used if penalty="SCAD" or penalty="LASSO". Step size in the FABS or SCADFABS algorithm. Default value is 0.005.
`sel.choice`	Only used if penalty="SCAD" or penalty="LASSO". Determines what method is used to determine the optimal regularization parameter. Possibles values are any subvector of c("BIC","CV","Boot"). Default is "BIC". Notice that "Boot" is necessarily added if Boot.inference is set to TRUE.
`nfolds`	Only used if sel.choice contains "CV". Number of folds in the cross-validation.
`seed.CV`	Only used if sel.choice contains "CV". Should a specific seed be used in the definition of the folds. Default value is NULL in which case no seed is imposed.
`foldID`	vector taking value from 1 to nfolds specifying the fold index of each observation. Default value is NULL in which case the folds are defined internally.
`Boot.inference`	should bootstrap inference be produced ? Default is FALSE. It is automatically turned to TRUE if sel.choice contains "Boot".
`B`	Only used if Boot.inference is TRUE. Number of bootstrap resamples. Default is 500.
`bootID`	Only used if Boot.inference is TRUE. matrix where each row provides the ID of the observations selected in each bootstrap resample. Default is NULL, in which case these are defined internally.
`seed.boot`	Only used if Boot.inference is TRUE. Should a specific seed be used in the definition of the folds. Default value is NULL in which case no seed is imposed.
`LR`	Estimation on the original sample. Output of a call to `Lorenz.GA` or `PLR.wrap`.
`LR.boot`	Estimation on the bootstrap resamples. In the non-penalized case, it is the output of a call to `Lorenz.boot`. In the penalized case, it is a list of size length(h.grid), where each element is the output of a call to `Lorenz.boot` and uses a different value of the bandwidth.
`...`	Additional parameters corresponding to arguments passed in `Lorenz.GA`, `Lorenz.SCADFABS` or `Lorenz.FABS` depending on the argument chosen in penalty.

Value

For the Non-penalized Lorenz Regression, a list with the following elements :

theta: the estimated vector of parameters.
pval.theta: Only returned if Boot.inference is TRUE. the pvalues associated to each element of the parameter vector.
summary: a vector including the estimated explained Gini coefficient and the Lorenz-R^2.
Gi.expl: the estimated explained Gini coefficient
LR2: the Lorenz-R^2 of the regression.
MRS: the matrix of estimated marginal rates of substitution. More precisely, if we want the MRS of X1 (numerator) with respect to X2 (denominator), we should look for row corresponding to X1 and column corresponding to X2.
Fit: A data frame containing the response (first column) and the estimated index (second column).
Gi.star: Only returned if Boot.inference is TRUE. A vector gathering the bootstrap estimators of the explained Gini coefficient.
LR2.star: Only returned if Boot.inference is TRUE. A vector gathering the bootstrap estimators of the Lorenz-R^2.
theta.star: Only returned if Boot.inference is TRUE. A matrix gathering the bootstrap estimators of theta (rows refer to bootstrap iterations and columns refer to the different coefficients)

. For the Penalized Lorenz Regression, a list with the following elements.

path: a list where the different elements correspond to the values of h.grid. Each element is a matrix where the first line displays the path of regularization parameters. The second and third lines display the evolution of the Lorenz-R^2 and explained Gini coefficient along that path. The next lines display the evolution of the scores of the methods chosen in sel.choice. The remaining lines display the evolution of the estimated parameter vector.
theta: a matrix where the different lines correspond to the methods chosen in sel.choice. Each line provides the estimated vector of parameters at the optimal value of the regularization parameter.
summary: a matrix where the different lines correspond to the methods chosen in sel.choice. Each line provides the estimated explained Gini coefficient, the Lorenz-R^2, the optimal lambda, the optimal bandwidth, the number of selected variables and the scores at the optimal value of the regularization parameter.
Gi.expl: a vector providing the estimated explained Gini coefficient at the optimal value of the regularization parameter for each method in sel.choice.
LR2: a vector providing the Lorenz-R^2 at the optimal value of the regularization parameter for each method in sel.choice.
MRS: a list where the different elements correspond to a method in sel.choice. Each element is a matrix of estimated marginal rates of substitution for non-zero coefficients at the optimal value of the regularization parameter.
Fit: A data frame containing the response (first column). The remaining columns give the estimated index at the optimal value of the regularization parameter, for each method chosen in sel.choice.
which.h: a vector providing the index of the optimal bandwidth for each method in sel.choice.
which.lambda: a vector providing the index of the optimal lambda for each method in sel.choice.
Gi.star: Only returned if Boot.inference is TRUE. A list (each element a different value of the bandwidth h) of lists (each element a different value of the penalty parameter) of vectors (each element a bootstrap iteration) gathering the bootstrap estimators of the explained Gini coefficient.
LR2.star: Only returned if Boot.inference is TRUE. Similarly for the Lorenz-R^2
theta.star: Only returned if Boot.inference is TRUE. A list (each element a different value of the bandwidth h) of lists (each element a different value of the penalty parameter) of matrices (rows are bootstrap iterations and columns refer to the coefficients) gathering the bootstrap estimators of theta.

In both cases, the list also technical information, namely the formula, data, weights and call.

References

Heuchenne, C. and A. Jacquemain (2022). Inference for monotone single-index conditional means: A Lorenz regression approach. Computational Statistics & Data Analysis 167(C). Jacquemain, A., C. Heuchenne, and E. Pircalabelu (2022). A penalised bootstrap estimation procedure for the explained Gini coefficient.

Examples

data(Data.Incomes)
set.seed(123)
Data <- Data.Incomes[sample(1:nrow(Data.Incomes),50),]
# 1. Non-penalized regression
NPLR <- Lorenz.Reg(Income ~ ., data = Data, penalty = "none",
                   popSize = 30)
# 2. Penalized regression
PLR <- Lorenz.Reg(Income ~ ., data = Data, penalty = "SCAD",
                  h.grid = nrow(Data.Incomes)^(-1/5.5),
                  sel.choice = c("BIC","CV"), eps = 0.01, nfolds = 5)
# Comparison
NPLR$theta;PLR$theta
NPLR$summary;PLR$summary

[Package LorenzRegression version 1.0.0 Index]