evaluate.candidates {SpiceFP} | R Documentation |
evaluate.candidates
Description
This function performs for each candidate matrix, a Generalized Fused Lasso (sparse fused lasso 2d or 3d) and computes various statistics and information criteria related to the constructed model.
Usage
evaluate.candidates(
candmatrices,
y,
penratios,
nknots,
appropriate.df = NULL,
ncores = parallel::detectCores() - 1,
penfun = NULL,
file_name = "parametertable",
write.external.file = TRUE
)
Arguments
candmatrices |
List. Output of the "candidates" function. The spicefp dimension is the first element. The second contains many lists of one candidate matrix and related vector with index and numbers of class intervals used per predictor. The other elements of the lists are the inputs of "candidates" function. If the user does not need the "candidates" function for the creation of candmatrices, it is possible to build a list provided that it respects the same structure as well as the names of the outputs of the "candidates" function. In this case only the first two elements of the list are essential: spicefp.dimension and candidates. The remaining elements can be NULL. |
y |
numerical vector. Contains the dependent variable. This vector will be used as response variable in the construction of models involving each candidate matrix. |
penratios |
numeric vector with values greater than or equal to 0. It represents the ratio between the regularization parameters of parsimony and fusion. When penratios=0, it corresponds to the pure fusion. The higher its value, the more parsimonious the model is. |
nknots |
integer. For one value in penratios vector, it represents the
number of models that will be constructed for each candidate matrix. It is
the argument "nlam" of |
appropriate.df |
(appropriate degree of freedom) NULL by default. Numerical vector with values greater than or equal to 1. The degree of freedom of generalized fused problem is equal the number of connected components. A connected component gives information on a group of non-zero coefficients sharing the same value and connected by a contiguity matrix. More simply, it can be interpreted as a group of coefficients that have a unique influence. When the user has a prior idea of the number of zones of influence that the desired solution could contain, it is advisable to provide appropriate.df, a vector of appropriate degrees of freedom. In this case, nknots must be NULL. |
ncores |
numbers of cores that will be used for parallel computation. By default, it is equal to detectCores()-1. |
penfun |
function with 2 arguments (dim1, dim2) when dealing with 2 dimensional spiceFP or 3 arguments (dim1, dim2, dim3) when dealing with 3 dimensional spiceFP. The argument order in the penalty function is associated with the order of numbers of class intervals used per predictor in the second element of candmatrices argument. NULL by default. When penfun=NULL, getD2dSparse of genlasso or getD3dSparse is used according to the dimension of spiceFP. |
file_name |
character. It is the name of the file in which the evaluation summary of all the candidate matrices is stored. This file is saved in your working directory. |
write.external.file |
logical. Indicates whether the result table should be written as a file (txt) in your working directory. It is recommended to use write.external.file=TRUE when evaluating a large number of candidate matrices (more than 100) in order to keep memory available. |
Details
This function mainly returns statistics on the models built based on the candidate matrices. For each candidate matrix, length(penratios) x nknots or length(penratios) x length(appropriate.df) models are constructed in order to estimate the regularization parameters and to perform a variable selection. The computed statistics provide information on the quality of the models. For obvious reasons of memory management, the coefficients related to each of these models are not stored. The statistics are stored in a file named via the argument file_name and can be consulted to get an idea of the state of progress of the program. The genlasso package is used for the implementation of the Generalized Fused Lasso.
Value
The output is a list with :
- evaluation.result
Same as file_name. The file contains a matrix with in columns : the candidate index (Candidate_id), the value of penratios used for this model (Pen_ratio), the parameter that penalizes the difference in related coefficients (PenPar_fusion), the degree of freedom of the model (Df_), the residual sum of squares (RSS_), the Akaike information criterion (AIC_), the Bayesian information criterion (BIC_), the Mallows' Cp (Cp_), the Generalized Cross Validation (GCV_), the slope of the regression lm(
y
~X \beta
) (Slope_), the ratiovar(y-X \beta)/var(y)
(Var_ratio).- response.variable, penalty.ratios, nknots, appropriate.df, penalty.function
Exactly the inputs y, penratios, nknots, appropriate.df, penfun
Examples
# Constructing 2 candidates for spiceFP data (temperature and Irradiance)
linbreaks<-function(x,n){
sort(round(seq(trunc(min(x)),
ceiling(max(x)+0.001),
length.out =unlist(n)+1),
1)
)
}
# In this example, we will evaluate 2 candidates (each having 10
# temperature classes and respectively 10 and 20 irradiance classes).
# Only one value is used for alpha (logbreaks argument)
tpr.nclass=10
irdc.nclass=c(10,20)
irdc.alpha=0.005
p2<-expand.grid(tpr.nclass, irdc.alpha, irdc.nclass)
parlist.tpr<-split(p2[,1], seq(nrow(p2)))
parlist.irdc<-split(p2[,2:3], seq(nrow(p2)))
parlist.irdc<-lapply(
parlist.irdc,function(x){
list(x[[1]],x[[2]])}
)
m.irdc <- as.matrix(Irradiance[,-c(1)])
m.tpr <- as.matrix(Temperature[,-c(1)])
test2<-candidates(fp1=m.irdc,
fp2=m.tpr,
fun1=logbreaks,
fun2=linbreaks,
parlists=list(parlist.irdc,
parlist.tpr),
xcentering = TRUE,
xscaling = FALSE,
ncores=2)
# Evaluating candidates
# For the constructed models, only one regularization parameter ratio
# penratios=c(1) is used. In a real case, we will have to evaluate
# more candidates and regularization parameters ratio.
start_time_ev <- Sys.time()
evcand<-evaluate.candidates(candmatrices = test2,
y=FerariIndex_Difference$fi_dif,
penratios=c(1),
appropriate.df=NULL,
nknots = 100,
ncores=2,
write.external.file = FALSE)
duration_ev <- Sys.time() - start_time_ev
tab_res<-evcand$evaluation.result
dim(tab_res)
tab_res[which.min(tab_res$AIC_),]
closeAllConnections()