R: Biologically Explainable Machine Learning Framework

BioM2 {BioM2}

R Documentation

Biologically Explainable Machine Learning Framework

Description

Biologically Explainable Machine Learning Framework

Usage

BioM2(
  TrainData = NULL,
  TestData = NULL,
  pathlistDB = NULL,
  FeatureAnno = NULL,
  resampling = NULL,
  nfolds = 5,
  classifier = "liblinear",
  predMode = "probability",
  PathwaySizeUp = 200,
  PathwaySizeDown = 20,
  MinfeatureNum_pathways = 10,
  Add_UnMapped = TRUE,
  Unmapped_num = 300,
  Add_FeartureSelection_Method = "wilcox.test",
  Inner_CV = TRUE,
  inner_folds = 10,
  Stage1_FeartureSelection_Method = "cor",
  cutoff = 0.3,
  Stage2_FeartureSelection_Method = "RemoveHighcor",
  cutoff2 = 0.95,
  classifier2 = NULL,
  target = "predict",
  p.adjust.method = "fdr",
  save_pathways_matrix = FALSE,
  cores = 1,
  verbose = TRUE
)

Arguments

`TrainData`	The input training dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.
`TestData`	The input test dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.
`pathlistDB`	A list of pathways with pathway IDs and their corresponding genes ('entrezID' is used). For details, please refer to ( data("GO2ALLEGS_BP") )
`FeatureAnno`	The annotation data stored in a data.frame for probe mapping. It must have at least two columns named 'ID' and 'entrezID'. (For details, please refer to data( data("MethylAnno") )
`resampling`	Resampling in mlr3verse.
`nfolds`	k-fold cross validation ( Only supported when TestData = NULL )
`classifier`	Learners in mlr3
`predMode`	The prediction mode. Available options are c('probability', 'classification').
`PathwaySizeUp`	The upper-bound of the number of genes in each biological pathways.
`PathwaySizeDown`	The lower-bound of the number of genes in each biological pathways.
`MinfeatureNum_pathways`	The minimal defined pathway size after mapping your own data to pathlistDB(KEGG database/GO database).
`Add_UnMapped`	Whether to add unmapped probes for prediction
`Unmapped_num`	The number of unmapped probes
`Add_FeartureSelection_Method`	Feature selection methods.
`Inner_CV`	Whether to perform a k-fold verification on the training set.
`inner_folds`	k-fold verification on the training set.
`Stage1_FeartureSelection_Method`	Feature selection methods.
`cutoff`	The cutoff used for feature selection threshold. It can be any value between 0 and 1.
`Stage2_FeartureSelection_Method`	Feature selection methods.
`cutoff2`	The cutoff used for feature selection threshold. It can be any value between 0 and 1.
`classifier2`	Learner for stage 2 prediction(if classifier2==NULL,then it is the same as the learner in stage 1.)
`target`	Is it used to predict or explore potential biological mechanisms? Available options are c('predict', 'pathways').
`p.adjust.method`	p-value adjustment method.(holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
`save_pathways_matrix`	Whether to output the path matrix file
`cores`	The number of cores used for computation.
`verbose`	Whether to print running process information to the console

Value

A list containing prediction results and prediction result evaluation

Examples




library(mlr3verse)
library(caret)
library(parallel)
library(BioM2)
data=MethylData_Test
set.seed(1)
part=unlist(createDataPartition(data$label,p=0.8))
Train=data[part,]
Test=data[-part,]
pathlistDB=GO2ALLEGS_BP
FeatureAnno=MethylAnno


pred=BioM2(TrainData = Train,TestData = Test,
           pathlistDB=pathlistDB,FeatureAnno=FeatureAnno,
           classifier='svm',nfolds=5,
           PathwaySizeUp=25,PathwaySizeDown=20,MinfeatureNum_pathways=10,
           Add_UnMapped='Yes',Unmapped_num=300,
           Inner_CV='None',inner_folds=5,
           Stage1_FeartureSelection_Method='cor',cutoff=0.3,
           Stage2_FeartureSelection_Method='None',
           target='predict',cores=1
)#(To explore biological mechanisms, set target=‘pathways’)

[Package BioM2 version 1.0.8 Index]