R: Feature Extraction by autoencoder

seq2feature_seq2seq {ProcData}

R Documentation

Feature Extraction by autoencoder

Description

seq2feature_seq2seq extract features from response processes by autoencoder.

Usage

seq2feature_seq2seq(seqs, ae_type = "action", K, rnn_type = "lstm",
  n_epoch = 50, method = "last", step_size = 1e-04,
  optimizer_name = "adam", cumulative = FALSE, log = TRUE,
  weights = c(1, 0.5), samples_train, samples_valid,
  samples_test = NULL, pca = TRUE, verbose = TRUE,
  return_theta = TRUE)

Arguments

`seqs`	an object of class `"proc"`.
`ae_type`	a string specifies the type of autoencoder. The autoencoder can be an action sequence autoencoder ("action"), a time sequence autoencoder ("time"), or an action-time sequence autoencoder ("both").
`K`	the number of features to be extracted.
`rnn_type`	the type of recurrent unit to be used for modeling response processes. `"lstm"` for the long-short term memory unit. `"gru"` for the gated recurrent unit.
`n_epoch`	the number of training epochs for the autoencoder.
`method`	the method for computing features from the output of an recurrent neural network in the encoder. Available options are `"last"` and `"avg"`.
`step_size`	the learning rate of optimizer.
`optimizer_name`	a character string specifying the optimizer to be used for training. Availabel options are `"sgd"`, `"rmsprop"`, `"adadelta"`, and `"adam"`.
`cumulative`	logical. If TRUE, the sequence of cumulative time up to each event is used as input to the neural network. If FALSE, the sequence of inter-arrival time (gap time between an event and the previous event) will be used as input to the neural network. Default is FALSE.
`log`	logical. If TRUE, for the timestamp sequences, input of the neural net is the base-10 log of the original sequence of times plus 1 (i.e., log10(t+1)). If FALSE, the original sequence of times is used.
`weights`	a vector of 2 elements for the weight of the loss of action sequences (categorical_crossentropy) and time sequences (mean squared error), respectively. The total loss is calculated as the weighted sum of the two losses.
`samples_train`, `samples_valid`, `samples_test`	vectors of indices specifying the training, validation and test sets for training autoencoder.
`pca`	logical. If TRUE, the principal components of features are returned. Default is TRUE.
`verbose`	logical. If TRUE, training progress is printed.
`return_theta`	logical. If TRUE, extracted features are returned.

Details

This function wraps aseq2feature_seq2seq, tseq2feature_seq2seq, and atseq2feature_seq2seq.

Value

seq2feature_seq2seq returns a list containing

`theta`	a matrix containing `K` features or principal features. Each column is a feature.
`train_loss`	a vector of length `n_epoch` recording the trace of training losses.
`valid_loss`	a vector of length `n_epoch` recording the trace of validation losses.
`test_loss`	a vector of length `n_epoch` recording the trace of test losses. Exists only if `samples_test` is not `NULL`.

References

Tang, X., Wang, Z., Liu, J., and Ying, Z. (2020) An exploratory analysis of the latent structure of process data via action sequence autoencoders. British Journal of Mathematical and Statistical Psychology. 74(1), 1-33.

Examples

 
if (!system("python -c 'import tensorflow as tf'", ignore.stdout = TRUE, ignore.stderr= TRUE)) {
  n <- 50
  data(cc_data)
  samples <- sample(1:length(cc_data$seqs$time_seqs), n)
  seqs <- sub_seqs(cc_data$seqs, samples)

  # action sequence autoencoder
  K_res <- chooseK_seq2seq(seqs=seqs, ae_type="action", K_cand=c(5, 10), 
                           n_epoch=5, n_fold=2, valid_prop=0.2)
  seq2seq_res <- seq2feature_seq2seq(seqs=seqs, ae_type="action", K=K_res$K, 
                         n_epoch=5, samples_train=1:40, samples_valid=41:50)
  theta <- seq2seq_res$theta

  # time sequence autoencoder
  K_res <- chooseK_seq2seq(seqs=seqs, ae_type="time", K_cand=c(5, 10), 
                           n_epoch=5, n_fold=2, valid_prop=0.2)
  seq2seq_res <- seq2feature_seq2seq(seqs=seqs, ae_type="time", K=K_res$K, 
                         n_epoch=5, samples_train=1:40, samples_valid=41:50)
  theta <- seq2seq_res$theta

  # action and time sequence autoencoder
  K_res <- chooseK_seq2seq(seqs=seqs, ae_type="both", K_cand=c(5, 10), 
                           n_epoch=5, n_fold=2, valid_prop=0.2)
  seq2seq_res <- seq2feature_seq2seq(seqs=seqs, ae_type="both", K=K_res$K, 
                         n_epoch=5, samples_train=1:40, samples_valid=41:50)
  theta <- seq2seq_res$theta
  plot(seq2seq_res$train_loss, col="blue", type="l")
  lines(seq2seq_res$valid_loss, col="red")
}

[Package ProcData version 0.3.2 Index]