| seq2feature_mds {ProcData} | R Documentation | 
Feature extraction via multidimensional scaling
Description
seq2feature_mds extracts K features from response processes by
multidimensional scaling.
Usage
seq2feature_mds(seqs = NULL, K = 2, method = "auto",
  dist_type = "oss_action", pca = TRUE, subset_size = 100,
  subset_method = "random", n_cand = 10, return_dist = FALSE,
  L_set = 1:3)
Arguments
| seqs | a  | 
| K | the number of features to be extracted. | 
| method | a character string specifies the algorithm used for performing MDS. See 'Details'. | 
| dist_type | a character string specifies the dissimilarity measure for two response processes. See 'Details'. | 
| pca | logical. If  | 
| subset_size,n_cand | two parameters used in the large data algorithm. See 'Details'
and  | 
| subset_method | a character string specifying the method for choosing the subset 
in the large data algorithm. See 'Details' and  | 
| return_dist | logical. If  | 
| L_set | length of ngrams considered | 
Details
Since the classical MDS has a computational complexity of order n^3 where 
n is the number of response processes, it is computational expensive to 
perform classical MDS when a large number of response processes is considered. 
In addition, storing an n \times n dissimilarity matrix when n is large
require a large amount of memory. In seq2feature_mds, the algorithm proposed
in Paradis (2018) is implemented to obtain MDS for large datasets. method 
specifies the algorithm to be used for obtaining MDS features. If method = "small",
classical MDS is used by calling cmdscale. If method = "large",
the algorithm for large datasets will be used. If method = "auto" (default), 
seq2feature_mds selects the algorithm automatically based on the sample size.
dist_type specifies the dissimilarity to be used for measuring the discrepancy
between two response processes. If dist_type = "oss_action", the order-based 
sequence similarity (oss) proposed in Gomez-Alonso and Valls (2008) is used 
for action sequences. If dist_type = "oss_both", both action sequences and
timestamp sequences are used to compute a time-weighted oss. 
The number of features to be extracted K can be selected by cross-validation 
using chooseK_mds.
Value
seq2feature_mds returns a list containing 
| theta | a numeric matrix giving the  | 
| dist_mat | the dissimilary matrix. This element exists only if 
 | 
References
Gomez-Alonso, C. and Valls, A. (2008). A similarity measure for sequences of categorical data based on the ordering of common elements. In V. Torra & Y. Narukawa (Eds.) Modeling Decisions for Artificial Intelligence, (pp. 134-145). Springer Berlin Heidelberg.
Paradis, E. (2018). Multidimensional scaling with very large datasets. Journal of Computational and Graphical Statistics, 27(4), 935-939.
Tang, X., Wang, Z., He, Q., Liu, J., and Ying, Z. (2020) Latent Feature Extraction for Process Data via Multidimensional Scaling. Psychometrika, 85, 378-397.
See Also
chooseK_mds for choosing K.
Other feature extraction methods: aseq2feature_seq2seq,
atseq2feature_seq2seq,
seq2feature_mds_large,
seq2feature_ngram,
seq2feature_seq2seq,
tseq2feature_seq2seq
Examples
n <- 50
set.seed(12345)
seqs <- seq_gen(n)
theta <- seq2feature_mds(seqs, 5)$theta