R: snap

snap {snap}

R Documentation

snap

Description

A simple wrapper to easily design vanilla deep neural networks using 'Tensorflow'/'Keras' backend for regression, classification and multi-label tasks, with some tweaks and tricks (skip shortcuts, embedding, feature selection and anomaly detection).

Usage

snap(
  data,
  target,
  task = NULL,
  positive = NULL,
  skip_shortcut = FALSE,
  embedding = "none",
  embedding_size = 10,
  folds = 3,
  reps = 1,
  holdout = 0.3,
  layers = 1,
  activations = "relu",
  regularization_L1 = 0,
  regularization_L2 = 0,
  nodes = 32,
  dropout = 0,
  span = 0.2,
  min_delta = 0,
  batch_size = 32,
  epochs = 50,
  imp_thresh = 0,
  anom_thresh = 1,
  output_activation = NULL,
  optimizer = "Adam",
  loss = NULL,
  metrics = NULL,
  winsor = FALSE,
  q_min = 0.01,
  q_max = 0.99,
  normalization = TRUE,
  seed = 42,
  verbose = 0
)

Arguments

`data`	A data frame including all the features and targets.
`target`	String. Single label for target feature when task is "regr" or "classif". String vector with multiple labels for target features when task is "multilabel".
`task`	String. Inferred by data type of target feature(s). Available options are: "regr", "classif", "multilabel". Default: NULL.
`positive`	String. Positive class label (only for classification task). Default: NULL.
`skip_shortcut`	Logical. Option to add a skip shortcut to improve network performance in case of many layers. Default: FALSE.
`embedding`	String. Available options are: "none", "global" (when identical values for different features hold different meanings), "sequence" (when identical values for different features hold the same meaning). Default: NULL.
`embedding_size`	Integer. Output dimension for the embedding layer. Default: 10.
`folds`	Positive integer. Number of folds for repeated cross-validation. Default: 3.
`reps`	Positive integer. Number of repetitions for repeated cross-validation. Default: 1.
`holdout`	Positive numeric. Percentage of cases for holdout validation. Default: 0.3.
`layers`	Positive integer. Number of layers for the neural net. Default: 1.
`activations`	String. String vector with the activation functions for each layer (for example, a neural net with 3 layers may have activations = c("relu", "gelu", "tanh")). Besides standard Tensorflow/Keras activations, you can also choose: "swish", "mish", "gelu", "bent". Default: "relu".
`regularization_L1`	Positive numeric. Value for L1 regularization of the loss function. Default: 0.
`regularization_L2`	Positive numeric. Value for L2 regularization of the loss function. Default: 0.
`nodes`	Positive integer. Integer vector with the nodes for each layer (for example, a neural net with 3 layers may have nodes = c(32, 64, 16)). Default: 32.
`dropout`	Positive numeric. Value for the dropout parameter for each layer (for example, a neural net with 3 layers may have dropout = c(0, 0.5, 0.3)). Default: 0.
`span`	Positive numeric. Percentage of epoch for the patience parameter. Default: 0.2.
`min_delta`	Positive numeric. Minimum improvement on metric to trigger the early stop. Default: 0.
`batch_size`	Positive integer. Maximum batch size for training. Default: 32.
`epochs`	Positive integer. Maximum number of forward and backward propagations. Default: 50.
`imp_thresh`	Positive numeric. Importance threshold (in percentiles) above which the features are included in the model (using ReliefFbestK metric by CORElearn). Default: 0 (all features included).
`anom_thresh`	Positive numeric. Anomaly threshold (in percentiles) above which the instances are excluded by the model (using lof by dbscan). Default: 1 (all instances included).
`output_activation`	String. Default: NULL. If not specified otherwise, it will be "Linear" for regression task, "Softmax" for classification task, "Sigmoid" for multilabel task.
`optimizer`	String. Standard Tensorflow/Keras Optimization methods are available. Default: "Adam".
`loss`	Default: NULL. If not specified otherwise, it will be "mean_absolute_error" for regression task, "categorical_crossentropy" for classification task, "binary_crossentropy" for multilabel task.
`metrics`	Default: NULL. If not specified otherwise, it will be "mean_absolute_error" for regression task, "categorical_crossentropy" for classification task, "binary_crossentropy" for multilabel task.
`winsor`	Logical. Set to TRUE in case you want to perform Winsorization on regression tasks. Default: FALSE.
`q_min`	Positive numeric. Minimum quantile threshold for Winsorization. Default: 0.01.
`q_max`	Positive numeric. Maximum quantile threshold for Winsorization. Default: 0.99.
`normalization`	Logical. After each layer it performs a batch normalization. Default: TRUE.
`seed`	Positive integer. Seed value to control random processes. Default: 42.
`verbose`	Positive integer. Set the level of information from Keras. Default: 0.

Value

This function returns a list including:

task: kind of task solved
configuration: main hyper-parameters describing the neural net (layers, activations, regularization_L1, regularization_L2, nodes, dropout)
model: Keras standard model description
pred_fun: function to use on the same data scheme to predict new values
plot: Keras standard history plot
testing_frame: testing set with the related predictions, including
trials: statistics for each trial during the repeated cross-validation (train set and validation set):
- task "classif": balanced accuracy (bac), precision (prc), sensitivity (sen), critical success index (csi), FALSE-score (fsc), Kappa (kpp), Kendall (kdl)
- task "regr": root mean square error(rmse), mean absolute error (mae), median absolute error (mdae), relative root square error (rrse), relative absolute error (rae), Pearson (prsn)
- task "multilabel": macro bac, macro prc, macro sensitivity, macro sen, macro csi, macro fsc, micro kpp, micro kdl
metrics: summary statistics as above for training, validation (both averaged over trials) and testing
selected_feat: labels of features included within the model
selected_inst: index of instances included within the model
time_log

Author(s)

Giancarlo Vercellino giancarlo.vercellino@gmail.com

Examples

## Not run: 
snap(friedman3, target="y")

snap(threenorm, target="classes", imp_thresh = 0.3, anom_thresh = 0.95)

snap(threenorm, "classes", layers = 2, activations = c("gelu", "swish"), nodes = c(32, 64))

## End(Not run)

[Package snap version 1.1.0 Index]