SpiceFP-package {SpiceFP} | R Documentation |
A Sparse and Structured Procedure to Identify Combined Effects of Functional Predictors
Description
A set of functions allowing to implement the 'SpiceFP' approach which is iterative. It involves transformation of functional predictors into several candidate explanatory matrices (based on contingency tables), to which relative edge matrices with contiguity constraints are associated. Generalized Fused Lasso regression are performed in order to identify the best candidate matrix, the best class intervals and related coefficients at each iteration. The approach is stopped when the maximal number of iterations is reached or when retained coefficients are zeros. Supplementary functions allow to get coefficients of any candidate matrix or mean of coefficients of many candidates. The methods in this package are describing in Girault Gnanguenon Guesse, Patrice Loisel, BĂ©nedicte Fontez, Thierry Simonneau, Nadine Hilgert (2021) "An exploratory penalized regression to identify combined effects of functional variables -Application to agri-environmental issues" https://hal.archives-ouvertes.fr/hal-03298977.
Details
The main function of the package is the spicefp
function. It directly
performs the three main steps of the SpiceFP approach, by using intermediate functions of the package.
1) At he first step, contingency tables are constructed by
defining joint modalities using class intervals or bins. Several candidate
partitions are then defined.
For each statistical individual i
and each candidate partition (denoted u
here), the 2 (resp. 3)
functional predictors are transformed into frequency bi(resp. tri)-variate histograms (or contingency tables),
stored as row vectors. The combination of these row vectors for all individuals enables the construction of a
candidate explanatory matrix indexed by u
(denoted here X^u
).
The function candidates
is designed to build these candidate matrices.
2) At the second step, for each candidate explanatory matrix, an edge matrix is defined to
represent the contiguity constraints between modalities of the contingency table.
3) Finally at the last step, the best class intervals and related
regression coefficients are defined by: i) performing a Generalized Fused Lasso
using each candidate explanatory matrix. The SpiceFP model is the following
y_i = X_i^u \beta^u + \varepsilon_i,
where \beta^u
is the coefficient to be estimated on the 2D (resp. 3D) intervals.
The estimator of \beta
is obtained as follows:
\hat{\beta}^{u,\gamma}(\lambda) = argmin \frac{1}{2} \|y - X^u \beta\|_2^2 + \lambda \|D ^{u,\gamma} \beta\|_1,
where \lambda
is a penalty parameter that controls the smoothness of the coefficients, and
\gamma
is the ratio between the regularization parameters of parsimony and fusion.
ii) choosing the best candidate matrix
and selecting its variables using an information criterion and checking the
shutdown conditions to stop the approach. Indeed, SpiceFP may be used in an iterative way. It
therefore allows to identify up to K best candidate matrices and related coefficients.
Author(s)
Maintainer: Girault Gnanguenon Guesse girault.gnanguenon@gmail.com
Authors:
Patrice Loisel patrice.loisel@inrae.fr
Benedicte Fontez benedicte.fontez@supagro.fr
Nadine Hilgert nadine.hilgert@inrae.fr
Other contributors:
Thierry Simonneau thierry.simonneau@inrae.fr [contractor]
Isabelle Sanchez isabelle.sanchez@inrae.fr [contractor]