BioM2 {BioM2} | R Documentation |
Biologically Explainable Machine Learning Framework
Description
Biologically Explainable Machine Learning Framework
Usage
BioM2(
TrainData = NULL,
TestData = NULL,
pathlistDB = NULL,
FeatureAnno = NULL,
resampling = NULL,
nfolds = 5,
classifier = "liblinear",
predMode = "probability",
PathwaySizeUp = 200,
PathwaySizeDown = 20,
MinfeatureNum_pathways = 10,
Add_UnMapped = TRUE,
Unmapped_num = 300,
Add_FeartureSelection_Method = "wilcox.test",
Inner_CV = TRUE,
inner_folds = 10,
Stage1_FeartureSelection_Method = "cor",
cutoff = 0.3,
Stage2_FeartureSelection_Method = "RemoveHighcor",
cutoff2 = 0.95,
classifier2 = NULL,
target = "predict",
p.adjust.method = "fdr",
save_pathways_matrix = FALSE,
cores = 1,
verbose = TRUE
)
Arguments
TrainData |
The input training dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member. |
TestData |
The input test dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member. |
pathlistDB |
A list of pathways with pathway IDs and their corresponding genes ('entrezID' is used). For details, please refer to ( data("GO2ALLEGS_BP") ) |
FeatureAnno |
The annotation data stored in a data.frame for probe mapping. It must have at least two columns named 'ID' and 'entrezID'. (For details, please refer to data( data("MethylAnno") ) |
resampling |
Resampling in mlr3verse. |
nfolds |
k-fold cross validation ( Only supported when TestData = NULL ) |
classifier |
Learners in mlr3 |
predMode |
The prediction mode. Available options are c('probability', 'classification'). |
PathwaySizeUp |
The upper-bound of the number of genes in each biological pathways. |
PathwaySizeDown |
The lower-bound of the number of genes in each biological pathways. |
MinfeatureNum_pathways |
The minimal defined pathway size after mapping your own data to pathlistDB(KEGG database/GO database). |
Add_UnMapped |
Whether to add unmapped probes for prediction |
Unmapped_num |
The number of unmapped probes |
Add_FeartureSelection_Method |
Feature selection methods. |
Inner_CV |
Whether to perform a k-fold verification on the training set. |
inner_folds |
k-fold verification on the training set. |
Stage1_FeartureSelection_Method |
Feature selection methods. |
cutoff |
The cutoff used for feature selection threshold. It can be any value between 0 and 1. |
Stage2_FeartureSelection_Method |
Feature selection methods. |
cutoff2 |
The cutoff used for feature selection threshold. It can be any value between 0 and 1. |
classifier2 |
Learner for stage 2 prediction(if classifier2==NULL,then it is the same as the learner in stage 1.) |
target |
Is it used to predict or explore potential biological mechanisms? Available options are c('predict', 'pathways'). |
p.adjust.method |
p-value adjustment method.(holm", "hochberg", "hommel", "bonferroni", "BH", "BY", |
save_pathways_matrix |
Whether to output the path matrix file |
cores |
The number of cores used for computation. |
verbose |
Whether to print running process information to the console |
Value
A list containing prediction results and prediction result evaluation
Examples
library(mlr3verse)
library(caret)
library(parallel)
library(BioM2)
data=MethylData_Test
set.seed(1)
part=unlist(createDataPartition(data$label,p=0.8))
Train=data[part,]
Test=data[-part,]
pathlistDB=GO2ALLEGS_BP
FeatureAnno=MethylAnno
pred=BioM2(TrainData = Train,TestData = Test,
pathlistDB=pathlistDB,FeatureAnno=FeatureAnno,
classifier='svm',nfolds=5,
PathwaySizeUp=25,PathwaySizeDown=20,MinfeatureNum_pathways=10,
Add_UnMapped='Yes',Unmapped_num=300,
Inner_CV='None',inner_folds=5,
Stage1_FeartureSelection_Method='cor',cutoff=0.3,
Stage2_FeartureSelection_Method='None',
target='predict',cores=1
)#(To explore biological mechanisms, set target=‘pathways’)