seq2feature_mds_large {ProcData}R Documentation

Feature Extraction by MDS for Large Dataset

Description

seq2feature_mds_large extracts MDS features from a large number of response processes. The algorithm proposed in Paradis (2018) is implemented with minor variations to perform MDS. The algorithm first selects a relatively small subset of response processes to perform the classical MDS. Then the coordinate of each of the other response processes are obtained by minimizing the loss function related to the target response processes and the those in the subset through BFGS.

Usage

seq2feature_mds_large(seqs, K, dist_type = "oss_action", subset_size,
  subset_method = "random", n_cand = 10, pca = TRUE, L_set = 1:3)

Arguments

seqs

an object of class "proc"

K

the number of features to be extracted.

dist_type

a character string specifies the dissimilarity measure for two response processes. See 'Details'.

subset_size

the size of the subset on which classical MDS is performed.

subset_method

a character string specifying the method for choosing the subset. It must be one of "random", "sample_avgmax", "sample_minmax", "full_avgmax", and "full_minmax".

n_cand

The size of the candidate set when selecting the subset. It is only used when subset_method is "sample_avgmax" or "sample_minmax".

pca

logical. If TRUE (default), the principal components of the extracted features are returned.

L_set

length of ngrams considered

Value

seq2feature_mds_large returns an n \times K matrix of extracted features.

References

Paradis, E. (2018). Multidimensional Scaling with Very Large Datasets. Journal of Computational and Graphical Statistics, 27, 935–939.

See Also

Other feature extraction methods: aseq2feature_seq2seq, atseq2feature_seq2seq, seq2feature_mds, seq2feature_ngram, seq2feature_seq2seq, tseq2feature_seq2seq


[Package ProcData version 0.3.2 Index]