R: Computes several features associated with a categorical time...

calculate_features {ctsfeatures}

R Documentation

Computes several features associated with a categorical time series

Description

calculate_features computes several features associated with a categorical time series or between a categorical and a real-valued time series

Usage

calculate_features(series, n_series = NULL, lag = 1, type = NULL)

Arguments

`series`	An object of type `tsibble` (see R package `tsibble`), whose column named Value contains the values of the corresponding CTS. This column must be of class `factor` and its levels must be determined by the range of the CTS.
`n_series`	A real-valued time series.
`lag`	The considered lag (default is 1).
`type`	String indicating the feature one wishes to compute.

Details

Assume we have a CTS of length T with range \mathcal{V}=\{1, 2, \ldots, r\}, \overline{X}_t=\{\overline{X}_1,\ldots, \overline{X}_T\}, with \widehat{p}_i being the natural estimate of the marginal probability of the ith category, and \widehat{p}_{ij}(l) being the natural estimate of the joint probability for categories i and j at lag l, i,j=1, \ldots, r. Assume also that we have a real-valued time series of length T, \overline{Z}_t=\{\overline{Z}_1,\ldots, \overline{Z}_T\}. The function computes the following quantities depending on the argument type:

If type=gini_index, the function computes the estimated gini index, \widehat{g}=\frac{r}{r-1}(1-\sum_{i=1}^{r}\widehat{p}_i^2).
If type=entropy, the function computes the estimated entropy, \widehat{e}=\frac{-1}{\ln(r)}\sum_{i=1}^{r}\widehat{p}_i\ln \widehat{p}_i.
If type=chebycheff_dispersion, the function computes the estimated chebycheff dispersion, \widehat{c}=\frac{r}{r-1}(1-\max_i\widehat{p}_i).
If type=gk_tau, the function computes the estimated Goodman and Kruskal's tau, \widehat{\tau}(l)=\frac{\sum_{i,j=1}^{r}\frac{\widehat{p}_{ij}(l)^2}{\widehat{p}_j}-\sum_{i=1}^r\widehat{p}_i^2}{1-\sum_{i=1}^r\widehat{p}_i^2}.
If type=gk_lambda, the function computes the estimated Goodman and Kruskal's lambda, \widehat{\lambda}(l)=\frac{\sum_{j=1}^{r}\max_i\widehat{p}_{ij}(l)-\max_i\widehat{p}_i}{1-\max_i\widehat{p}_i}.
If type=uncertainty_coefficient, the function computes the estimated uncertainty coefficient, \widehat{u}(l)=-\frac{\sum_{i, j=1}^{r}\widehat{p}_{ij}(l)\ln\big(\frac{\widehat{p}_{ij}(l)}{\widehat{p}_i\widehat{p}_j}\big)}{\sum_{i=1}^{r}\widehat{p}_i\ln \widehat{p}_i}.
If type=pearson_measure, the function computes the estimated Pearson measure, \widehat{X}_T^2(l)=T\sum_{i,j=1}^{r}\frac{(\widehat{p}_{ij}(l)-\widehat{p}_i\widehat{p}_j)^2}{\widehat{p}_i\widehat{p}_j}.
If type=phi2_measure, the function computes the estimated Phi2 measure, \widehat{\Phi}^2(l)=\frac{\widehat{X}_T^2(l)}{T}.
If type=sakoda_measure, the function computes the estimated Sakoda measure, \widehat{p}^*(l)=\sqrt{\frac{r\widehat{\Phi}^2(l)}{(r-1)(1+\widehat{\Phi}^2(l))}}.
If type=cramers_vi, the function computes the estimated Cramer's vi, \widehat{v}(l)=\sqrt{\frac{1}{r-1}\sum_{i,j=1}^r\frac{(\widehat{p}_{ij}(l)-\widehat{p}_i\widehat{p}_j)^2}{\widehat{p}_i\widehat{p}_j}}.
If type=cohens_kappa, the function computes the estimated Cohen's kappa, \widehat{\kappa}(l)=\frac{\sum_{j=1}^{r}(\widehat{p}_{jj}(l)-\widehat{p}_j^2)}{1-\sum_{i=1}^r\widehat{p}_i^2}.
If type=total_correlation, the function computes the the estimated sum \widehat{\Psi}(l)=\frac{1}{r^2}\sum_{i,j=1}^{r}\widehat{\psi}_{ij}(l)^2, where \widehat{\psi}_{ij}(l) is the estimated correlation \widehat{Corr}(Y_{t, i}, Y_{t-l, j}), i,j=1,\ldots,r, being \overline{\boldsymbol Y}_t=\{\overline{\boldsymbol Y}_1, \ldots, \overline{\boldsymbol Y}_T\}, with \overline{\boldsymbol Y}_k=(\overline{Y}_{k,1}, \ldots, \overline{Y}_{k,r})^\top, the binarized time series of \overline{X}_t.
If type=spectral_envelope, the function computes the estimated spectral envelope.
If type=total_mixed_correlation_1, the function computes the estimated total mixed l-correlation given by

\widehat{\Psi}_1(l)=\frac{1}{r}\sum_{i=1}^{r}\widehat{\psi}_{i}(l)^2,

where \widehat{\psi}_{i}(l)=\widehat{Corr}(Y_{t,i}, Z_{t-l}), being \overline{\boldsymbol Y}_t=\{\overline{\boldsymbol Y}_1, \ldots, \overline{\boldsymbol Y}_T\}, with \overline{\boldsymbol Y}_k=(\overline{Y}_{k,1}, \ldots, \overline{Y}_{k,r})^\top, the binarized time series of \overline{X}_t.
If type=total_mixed_correlation_2, the function computes the estimated total mixed q-correlation given by

\widehat{\Psi}_2(l)=\frac{1}{r}\sum_{i=1}^{r}\int_{0}^{1}\widehat{\psi}^\rho_{i}(l)^2d\rho,

where \widehat{\psi}_{i}^\rho(l)=\widehat{Corr}\big(Y_{t,i}, I(Z_{t-l}\leq q_{Z_t}(\rho)) \big), being \overline{\boldsymbol Y}_t=\{\overline{\boldsymbol Y}_1, \ldots, \overline{\boldsymbol Y}_T\}, with \overline{\boldsymbol Y}_k=(\overline{Y}_{k,1}, \ldots, \overline{Y}_{k,r})^\top, the binarized time series of \overline{X}_t, \rho \in (0, 1) a probability level, I(\cdot) the indicator function and q_{Z_t} the quantile function of the corresponding real-valued process.

Value

The corresponding feature.

Author(s)

Ángel López-Oriona, José A. Vilar

References

Weiß CH, Göb R (2008). “Measuring serial dependence in categorical time series.” AStA Advances in Statistical Analysis, 92, 71–89.

Examples

sequence_1 <- GeneticSequences[which(GeneticSequences$Series==1),]
uc <- calculate_features(series = sequence_1, type = 'uncertainty_coefficient' )
# Computing the uncertainty coefficient
# for the first series in dataset GeneticSequences
se <- calculate_features(series = sequence_1, type = 'spectral_envelope' )
# Computing the spectral envelope
# for the first series in dataset GeneticSequences

[Package ctsfeatures version 1.2.2 Index]