seq2feature_mds {ProcData} | R Documentation |
Feature extraction via multidimensional scaling
Description
seq2feature_mds
extracts K
features from response processes by
multidimensional scaling.
Usage
seq2feature_mds(seqs = NULL, K = 2, method = "auto",
dist_type = "oss_action", pca = TRUE, subset_size = 100,
subset_method = "random", n_cand = 10, return_dist = FALSE,
L_set = 1:3)
Arguments
seqs |
a |
K |
the number of features to be extracted. |
method |
a character string specifies the algorithm used for performing MDS. See 'Details'. |
dist_type |
a character string specifies the dissimilarity measure for two response processes. See 'Details'. |
pca |
logical. If |
subset_size , n_cand |
two parameters used in the large data algorithm. See 'Details'
and |
subset_method |
a character string specifying the method for choosing the subset
in the large data algorithm. See 'Details' and |
return_dist |
logical. If |
L_set |
length of ngrams considered |
Details
Since the classical MDS has a computational complexity of order n^3
where
n
is the number of response processes, it is computational expensive to
perform classical MDS when a large number of response processes is considered.
In addition, storing an n \times n
dissimilarity matrix when n
is large
require a large amount of memory. In seq2feature_mds
, the algorithm proposed
in Paradis (2018) is implemented to obtain MDS for large datasets. method
specifies the algorithm to be used for obtaining MDS features. If method = "small"
,
classical MDS is used by calling cmdscale
. If method = "large"
,
the algorithm for large datasets will be used. If method = "auto"
(default),
seq2feature_mds
selects the algorithm automatically based on the sample size.
dist_type
specifies the dissimilarity to be used for measuring the discrepancy
between two response processes. If dist_type = "oss_action"
, the order-based
sequence similarity (oss) proposed in Gomez-Alonso and Valls (2008) is used
for action sequences. If dist_type = "oss_both"
, both action sequences and
timestamp sequences are used to compute a time-weighted oss.
The number of features to be extracted K
can be selected by cross-validation
using chooseK_mds
.
Value
seq2feature_mds
returns a list containing
theta |
a numeric matrix giving the |
dist_mat |
the dissimilary matrix. This element exists only if
|
References
Gomez-Alonso, C. and Valls, A. (2008). A similarity measure for sequences of categorical data based on the ordering of common elements. In V. Torra & Y. Narukawa (Eds.) Modeling Decisions for Artificial Intelligence, (pp. 134-145). Springer Berlin Heidelberg.
Paradis, E. (2018). Multidimensional scaling with very large datasets. Journal of Computational and Graphical Statistics, 27(4), 935-939.
Tang, X., Wang, Z., He, Q., Liu, J., and Ying, Z. (2020) Latent Feature Extraction for Process Data via Multidimensional Scaling. Psychometrika, 85, 378-397.
See Also
chooseK_mds
for choosing K
.
Other feature extraction methods: aseq2feature_seq2seq
,
atseq2feature_seq2seq
,
seq2feature_mds_large
,
seq2feature_ngram
,
seq2feature_seq2seq
,
tseq2feature_seq2seq
Examples
n <- 50
set.seed(12345)
seqs <- seq_gen(n)
theta <- seq2feature_mds(seqs, 5)$theta