ProteinSequences {ctsfeatures} | R Documentation |
ProteinSequences
Description
Categorical time series (CTS) of protein sequences from different species
Usage
data(ProteinSequences)
Format
A tsibble
with four columns, which are:
Value
The categorical values of the time series in the dataset.
Series
Integer values indicating the considered time series (there are 40 time series in the dataset).
Time
Integer values indicating the temporal indexes of the observations.
Class
Integer values indicating the class of each time series.
Details
The column Value
is the concatenation of 40 time series
taking four categorical values (amino-acids). The column Class
is formed
by integers from 1 to 4, indicating that there are 4 different classes in the database. Each class is associated with a different
family of viruses. For more information, see López-Oriona et al. (2023).
References
López-Oriona Á, Vilar JA, D’Urso P (2023). “Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences.” Information Sciences, 624, 467–492.