trainDDLSModel {digitalDLSorteR} | R Documentation |
Train Deep Neural Network model
Description
Train a Deep Neural Network model using the training data from
DigitalDLSorter
object. In addition, the trained model
is evaluated with test data and prediction results are computed to determine
its performance (see ?calculateEvalMetrics
). Training and
evaluation can be performed using simulated profiles stored in the
DigitalDLSorter
object or 'on the fly' by simulating the
pseudo-bulk profiles at the same time as the training/evaluation is performed
(see Details).
Usage
trainDDLSModel(
object,
type.data.train = "bulk",
type.data.test = "bulk",
batch.size = 64,
num.epochs = 60,
num.hidden.layers = 2,
num.units = c(200, 200),
activation.fun = "relu",
dropout.rate = 0.25,
loss = "kullback_leibler_divergence",
metrics = c("accuracy", "mean_absolute_error", "categorical_accuracy"),
normalize = TRUE,
scaling = "standardize",
norm.batch.layers = TRUE,
custom.model = NULL,
shuffle = TRUE,
use.generator = FALSE,
on.the.fly = FALSE,
pseudobulk.function = "AddRawCount",
threads = 1,
view.metrics.plot = TRUE,
verbose = TRUE
)
Arguments
object |
|
type.data.train |
Type of profiles to be used for training. It can be
|
type.data.test |
Type of profiles to be used for evaluation. It can be
|
batch.size |
Number of samples per gradient update. If not specified,
|
num.epochs |
Number of epochs to train the model (10 by default). |
Number of hidden layers of the neural network (2 by
default). This number must be equal to the length of | |
num.units |
Vector indicating the number of neurons per hidden layer
( |
activation.fun |
Activation function to use ( |
dropout.rate |
Float between 0 and 1 indicating the fraction of the input neurons to drop in layer dropouts (0.25 by default). By default, digitalDLSorteR implements 1 dropout layer per hidden layer. |
loss |
Character indicating loss function selected for model training
( |
metrics |
Vector of metrics used to assess model performance during
training and evaluation ( |
normalize |
Whether to normalize data using logCPM ( |
scaling |
How to scale data before training. It may be:
|
norm.batch.layers |
Whether to include batch normalization layers
between each hidden dense layer ( |
custom.model |
It allows to use a custom neural network. It must be a
|
shuffle |
Boolean indicating whether data will be shuffled ( |
use.generator |
Boolean indicating whether to use generators during
training and test. Generators are automatically used when |
on.the.fly |
Boolean indicating whether data will be generated 'on the
fly' during training ( |
pseudobulk.function |
Function used to build pseudo-bulk samples. It may be:
|
threads |
Number of threads used during simulation of pseudo-bulk
samples if |
view.metrics.plot |
Boolean indicating whether to show plots of loss and
metrics progression during training ( |
verbose |
Boolean indicating whether to display model progression during
training and model architecture information ( |
Details
Keras/Tensorflow environment
All Deep Learning related steps in the digitalDLSorteR package are
performed by using the keras package, an API in R for keras in
Python available on CRAN. We recommend using the installTFpython
function included in the package.
Simulation of bulk RNA-Seq profiles 'on the fly'
trainDDLSModel
allows to avoid storing bulk RNA-Seq
profiles by using on.the.fly
argument. This functionality aims to
avoid exexcution times and memory usage of the simBulkProfiles
function, as the simulated pseudo-bulk profiles are built in each batch
during training/evaluation.
Neural network architecture
By default, trainDDLSModel
implements the
architecture selected in Torroja and Sánchez-Cabo, 2019. However, as the
default architecture may not produce good results depending on the dataset,
it is possible to change its parameters by using the corresponding argument:
number of hidden layers, number of neurons for each hidden layer, dropout
rate, activation function and loss function. For more customized models, it
is possible to provide a pre-built model in the custom.model
argument
(a keras.engine.sequential.Sequential
object) where it is necessary
that the number of input neurons is equal to the number of considered
features/genes and the number of output neurons is equal to the number of
considered cell types.
Value
A DigitalDLSorter
object with
trained.model
slot containing a
DigitalDLSorterDNN
object. For more information about
the structure of this class, see ?DigitalDLSorterDNN
.
References
Torroja, C. and Sánchez-Cabo, F. (2019). digitalDLSorter: A Deep Learning algorithm to quantify immune cell populations based on scRNA-Seq data. Frontiers in Genetics 10, 978. doi: doi:10.3389/fgene.2019.00978
See Also
plotTrainingHistory
deconvDigitalDLSorter
deconvDDLSObj
Examples
## Not run:
set.seed(123) # reproducibility
sce <- SingleCellExperiment::SingleCellExperiment(
assays = list(
counts = matrix(
rpois(30, lambda = 5), nrow = 15, ncol = 10,
dimnames = list(paste0("Gene", seq(15)), paste0("RHC", seq(10)))
)
),
colData = data.frame(
Cell_ID = paste0("RHC", seq(10)),
Cell_Type = sample(x = paste0("CellType", seq(2)), size = 10,
replace = TRUE)
),
rowData = data.frame(
Gene_ID = paste0("Gene", seq(15))
)
)
DDLS <- createDDLSobject(
sc.data = sce,
sc.cell.ID.column = "Cell_ID",
sc.gene.ID.column = "Gene_ID",
sc.filt.genes.cluster = FALSE,
sc.log.FC = FALSE
)
probMatrixValid <- data.frame(
Cell_Type = paste0("CellType", seq(2)),
from = c(1, 30),
to = c(15, 70)
)
DDLS <- generateBulkCellMatrix(
object = DDLS,
cell.ID.column = "Cell_ID",
cell.type.column = "Cell_Type",
prob.design = probMatrixValid,
num.bulk.samples = 30,
verbose = TRUE
)
# training of DDLS model
tensorflow::tf$compat$v1$disable_eager_execution()
DDLS <- trainDDLSModel(
object = DDLS,
on.the.fly = TRUE,
batch.size = 12,
num.epochs = 5
)
## End(Not run)