mparheuristic {rminer} | R Documentation |
Function that returns a list of searching (hyper)parameters for a particular model (classification or regression) or for a multiple list of models (automl or ensembles).
Description
Easy to use function that returns a list of searching (hyper)parameters for a particular model (classification or regression) or for a multiple list of models (automl or ensembles).
The result is to be put in a search
argument, used by fit
or mining
functions. Something
like:
search=list(search=mparheuristic(...),...)
.
Usage
mparheuristic(model, n = NA, lower = NA, upper = NA, by = NA, exponential = NA,
kernel = "rbfdot", task = "prob", inputs = NA)
Arguments
model |
model type name. See
|
n |
number of searches or heuristic (either
|
lower |
lower bound for the (hyper)parameter (if |
upper |
upper bound for the (hyper)parameter (if |
by |
increment in the sequence (if |
exponential |
if an exponential scale should be used in the search sequence (the |
kernel |
optional kernel type, only used when |
task |
optional task argument, only used for uniform design ( |
inputs |
optional inputs argument: the number of inputs, only used by |
Details
This function facilitates the definition of the search
argument used by fit
or mining
functions.
Using simple heuristics, reasonable (hyper)parameter search values are suggested for several rminer models. For models not
mapped in this function, the function returns NULL
, which means that no hyperparameter search is executed (often,
this implies using rminer or R function default values).
The simple usage of heuristic
assumes lower and upper bounds for a (hyper)parameter. If n=1
, then rminer or R defaults are assumed.
Else, a search is created using seq(lower,upper,by)
, where by
was set by the used or computed from n
.
For some model="ksvm"
setups, 2^seq(...)
is used for sigma and C, (1/10)^seq(...)
is used for scale.
Please check the resulting object to inspect the obtained final search values.
This function also allows to easily set multiple model searches, under the: "automl", "automl2", "automl3" or vector character options (see below examples).
Value
A list with one ore more (hyper)parameter values to be searched.
Note
See also http://hdl.handle.net/1822/36210 and http://www3.dsi.uminho.pt/pcortez/rminer.html
Author(s)
Paulo Cortez http://www3.dsi.uminho.pt/pcortez/
References
To check for more details about rminer and for citation purposes:
P. Cortez.
Data Mining with Neural Networks and Support Vector Machines Using the R/rminer Tool.
In P. Perner (Ed.), Advances in Data Mining - Applications and Theoretical Aspects 10th Industrial Conference on Data Mining (ICDM 2010), Lecture Notes in Artificial Intelligence 6171, pp. 572-583, Berlin, Germany, July, 2010. Springer. ISBN: 978-3-642-14399-1.
@Springer: https://link.springer.com/chapter/10.1007/978-3-642-14400-4_44
http://www3.dsi.uminho.pt/pcortez/2010-rminer.pdf
The automl is inspired in this work:
L. Ferreira, A. Pilastri, C. Martins, P. Santos, P. Cortez.
An Automated and Distributed Machine Learning Framework for Telecommunications Risk Management. In J. van den Herik et al. (Eds.), Proceedings of 12th International Conference on Agents and Artificial Intelligence – ICAART 2020, Volume 2, pp. 99-107, Valletta, Malta, February, 2020, SCITEPRESS, ISBN 978-989-758-395-7.
@INSTICC: https://www.insticc.org/Primoris/Resources/PaperPdf.ashx?idPaper=89528
This tutorial shows additional code examples:
P. Cortez.
A tutorial on using the rminer R package for data mining tasks.
Teaching Report, Department of Information Systems, ALGORITMI Research Centre, Engineering School, University of Minho, Guimaraes, Portugal, July 2015.
http://hdl.handle.net/1822/36210
Some lower/upper bounds and heuristics were retrieved from:
M. Fernandez-Delgado, E. Cernadas, S. Barro and D. Amorim. Do we need hundreds of classifiers to solve real world classification problems?. In The Journal of Machine Learning Research, 15(1), 3133-3181, 2014.
See Also
Examples
## "kknn"
s=mparheuristic("kknn",n="heuristic")
print(s)
s=mparheuristic("kknn",n=1) # same thing
print(s)
s=mparheuristic("kknn",n="heuristic5")
print(s)
s=mparheuristic("kknn",n=5) # same thing
print(s)
s=mparheuristic("kknn",lower=5,upper=15,by=2)
print(s)
# exponential scale:
s=mparheuristic("kknn",lower=1,upper=5,by=1,exponential=2)
print(s)
## "mlpe"
s=mparheuristic("mlpe")
print(s) # "NA" means set size with min(inputs/2,10) in fit
s=mparheuristic("mlpe",n="heuristic10")
print(s)
s=mparheuristic("mlpe",n=10) # same thing
print(s)
s=mparheuristic("mlpe",n=10,lower=2,upper=20)
print(s)
## "randomForest", upper should be set to the number of inputs = max mtry
s=mparheuristic("randomForest",n=10,upper=6)
print(s)
## "ksvm"
s=mparheuristic("ksvm",n=10)
print(s)
s=mparheuristic("ksvm",n=10,kernel="vanilladot")
print(s)
s=mparheuristic("ksvm",n=10,kernel="polydot")
print(s)
## lssvm
s=mparheuristic("lssvm",n=10)
print(s)
## rvm
s=mparheuristic("rvm",n=5)
print(s)
s=mparheuristic("rvm",n=5,kernel="vanilladot")
print(s)
## "rpart" and "ctree" are special cases (see help(fit,package=rminer) examples):
s=mparheuristic("rpart",n=3) # 3 cp values
print(s)
s=mparheuristic("ctree",n=3) # 3 mincriterion values
print(s)
### examples with fit
## Not run:
### classification
data(iris)
# ksvm and rbfdot:
model="ksvm";kernel="rbfdot"
s=mparheuristic(model,n="heuristic5",kernel=kernel)
print(s) # 5 sigma values
search=list(search=s,method=c("holdout",2/3,123))
# task "prob" is assumed, optimization of "AUC":
M=fit(Species~.,data=iris,model=model,search=search,fdebug=TRUE)
print(M@mpar)
# different lower and upper range:
s=mparheuristic(model,n=5,kernel=kernel,lower=-5,upper=1)
print(s) # from 2^-5 to 2^1
search=list(search=s,method=c("holdout",2/3,123))
# task "prob" is assumed, optimization of "AUC":
M=fit(Species~.,data=iris,model=model,search=search,fdebug=TRUE)
print(M@mpar)
# different exponential scale:
s=mparheuristic(model,n=5,kernel=kernel,lower=-4,upper=0,exponential=10)
print(s) # from 10^-5 to 10^1
search=list(search=s,method=c("holdout",2/3,123))
# task "prob" is assumed, optimization of "AUC":
M=fit(Species~.,data=iris,model=model,search=search,fdebug=TRUE)
print(M@mpar)
# "lssvm" Gaussian model, pure classification and ACC optimization, full iris:
model="lssvm";kernel="rbfdot"
s=mparheuristic("lssvm",n=3,kernel=kernel)
print(s)
search=list(search=s,method=c("holdout",2/3,123))
M=fit(Species~.,data=iris,model=model,search=search,fdebug=TRUE)
print(M@mpar)
# test several heuristic5 searches, full iris:
n="heuristic5";inputs=ncol(iris)-1
model=c("ctree","rpart","kknn","ksvm","lssvm","mlpe","randomForest")
for(i in 1:length(model))
{
cat("--- i:",i,"model:",model[i],"\n")
if(model[i]=="randomForest") s=mparheuristic(model[i],n=n,upper=inputs)
else s=mparheuristic(model[i],n=n)
print(s)
search=list(search=s,method=c("holdout",2/3,123))
M=fit(Species~.,data=iris,model=model[i],search=search,fdebug=TRUE)
print(M@mpar)
}
# test several Delgado 2014 searches (some cases launch warnings):
model=c("mlp","mlpe","mlp","ksvm","ksvm","ksvm",
"ksvm","lssvm","rpart","rpart","ctree",
"ctree","randomForest","kknn","kknn","multinom")
n=c("mlp_t","avNNet_t","nnet_t","svm_C","svmRadial_t","svmLinear_t",
"svmPoly_t","lsvmRadial_t","rpart_t","rpart2_t","ctree_t",
"ctree2_t","rf_t","knn_R","knn_t","multinom_t")
inputs=ncol(iris)-1
for(i in 1:length(model))
{
cat("--- i:",i,"model:",model[i],"heuristic:",n[i],"\n")
if(model[i]=="randomForest") s=mparheuristic(model[i],n=n[i],upper=inputs)
else s=mparheuristic(model[i],n=n[i])
print(s)
search=list(search=s,method=c("holdout",2/3,123))
M=fit(Species~.,data=iris,model=model[i],search=search,fdebug=TRUE)
print(M@mpar)
}
## End(Not run) #dontrun
### regression
## Not run:
data(sa_ssin)
s=mparheuristic("ksvm",n=3,kernel="polydot")
print(s)
search=list(search=s,metric="MAE",method=c("holdout",2/3,123))
M=fit(y~.,data=sa_ssin,model="ksvm",search=search,fdebug=TRUE)
print(M@mpar)
# regression task, predict iris "Petal.Width":
data(iris)
ir2=iris[,1:4]
names(ir2)[ncol(ir2)]="y" # change output name
n=3;inputs=ncol(ir2)-1 # 3 hyperparameter searches
model=c("ctree","rpart","kknn","ksvm","mlpe","randomForest","rvm")
for(i in 1:length(model))
{
cat("--- i:",i,"model:",model[i],"\n")
if(model[i]=="randomForest") s=mparheuristic(model[i],n=n,upper=inputs)
else s=mparheuristic(model[i],n=n)
print(s)
search=list(search=s,method=c("holdout",2/3,123))
M=fit(y~.,data=ir2,model=model[i],search=search,fdebug=TRUE)
print(M@mpar)
}
## End(Not run) #dontrun
### multiple model examples:
## Not run:
data(iris)
inputs=ncol(iris)-1; task="prob"
# 5 machine learning (ML) algorithms, 1 heuristic hyperparameter per algorithm:
sm=mparheuristic(model="automl",task=task,inputs=inputs)
print(sm)
# 5 ML with 10/13 hyperparameter searches:
sm=mparheuristic(model="automl2",task=task,inputs=inputs)
# note: mtry only has 4 searches due to the inputs limit:
print(sm)
# regression example:
ir2=iris[,1:4]
inputs=ncol(ir2)-1; task="reg"
sm=mparheuristic(model="automl2",task=task,inputs=inputs)
# note: ksvm contains 3 UD hyperparameters (and not 2) since task="reg":
print(sm)
# 5 ML and stacking:
inputs=ncol(iris)-1; task="prob"
sm=mparheuristic(model="automl3",task=task,inputs=inputs)
# note: $ls only has 5 elements, one for each individual ML
print(sm)
# other manual design examples: --------------------------------------
# 5 ML and three ensembles:
# the fit or mining functions will search for the best option
# between any of the 5 ML algorithms and any of the three
# ensemble approaches:
sm2=mparheuristic(model="automl3",task=task,inputs=inputs)
# note: ensembles need to be at the end of the $models field:
sm2$models=c(sm2$models,"AE","WE") # add AE and WE
sm2$smethod=c(sm2$smethod,rep("grid",2)) # add grid to AE and WE
# note: $ls only has 5 elements, one for each individual ML
print(sm2)
# 3 ML example:
models=c("cv.glmnet","mlpe","ksvm") # just 3 models
# note: in rminer the default cv.glmnet does not have "hyperparameters"
# since the cv automatically sets lambda
n=c(NA,10,"UD") # 10 searches for mlpe and 13 for ksvm
sm3=mparheuristic(model=models,n=n)
# note: $ls only has 5 elements, one for each individual ML
print(sm3)
# usage in sm2 and sm3 for fit (see mining help for usages in mining):
method=c("holdout",2/3,123)
d=iris
names(d)[ncol(d)]="y" # change output name
s2=list(search=sm2,smethod="auto",method=method,metric="AUC",convex=0)
M2=fit(y~.,data=d,model="auto",search=s2,fdebug=TRUE)
s3=list(search=sm3,smethod="auto",method=method,metric="AUC",convex=0)
M3=fit(y~.,data=d,model="auto",search=s3,fdebug=TRUE)
# -------------------------------------------------------------------
## End(Not run)