## Compute the optimal encoding for each state

### Description

Compute the optimal encoding for categorical functional data using an extension of the multiple correspondence analysis to a stochastic process.

### Usage

compute_optimal_encoding(
data,
basisobj,
computeCI = TRUE,
nBootstrap = 50,
propBootstrap = 1,
nCores = max(1, ceiling(detectCores()/2)),
verbose = TRUE,
...
)

### Arguments

 data data.frame containing id, id of the trajectory, time, time at which a change occurs and state, associated state. All individuals must begin at the same time T0 and end at the same time Tmax (use cut_data). basisobj basis created using the fda package (cf. create.basis). computeCI if TRUE, perform a bootstrap to estimate the variance of encoding's coefficients nBootstrap number of bootstrap samples propBootstrap size of bootstrap samples relative to the number of individuals: propBootstrap * number of individuals nCores number of cores used for parallelization. Default is the half of cores. verbose if TRUE print some information ... parameters for integrate function (see details).

### Details

See the vignette for the mathematical background: RShowDoc("cfda", package = "cfda")

Extra parameters (...) for the integrate function can be:

• subdivisions the maximum number of subintervals.

• rel.tol relative accuracy requested.

• abs.tol absolute accuracy requested.

### Value

A list containing:

• eigenvalues eigenvalues

• alpha optimal encoding coefficients associated with each eigenvectors

• pc principal components

• F matrix containing the F_{(x,i)(y,j)}

• V matrix containing the V_{(x,i)}

• G covariance matrix of V

• basisobj basisobj input parameter

• pt output of estimate_pt function

• bootstrap Only if computeCI = TRUE. Output of every bootstrap run

• varAlpha Only if computeCI = TRUE. Variance of alpha parameters

• runTime Total elapsed time

### Author(s)

Cristian Preda, Quentin Grimonprez

### References

• Deville J.C. (1982) Analyse de données chronologiques qualitatives : comment analyser des calendriers ?, Annales de l'INSEE, No 45, p. 45-104.

• Deville J.C. et Saporta G. (1980) Analyse harmonique qualitative, DIDAY et al. (editors), Data Analysis and Informatics, North Holland, p. 375-389.

• Saporta G. (1981) Méthodes exploratoires d'analyse de données temporelles, Cahiers du B.U.R.O, Université Pierre et Marie Curie, 37-38, Paris.

### Examples

# Simulate the Jukes-Cantor model of nucleotide replacement
K <- 4
Tmax <- 5
PJK <- matrix(1/3, nrow = K, ncol = K) - diag(rep(1/3, K))
lambda_PJK <- c(1, 1, 1, 1)
d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax,
labels = c("A", "C", "G", "T"))
d_JK2 <- cut_data(d_JK, Tmax)

# create basis object
m <- 5
b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4)

# compute encoding
encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1)
summary(encoding)

# plot the optimal encoding
plot(encoding)

# plot the two first components
plotComponent(encoding, comp = c(1, 2))

# extract the optimal encoding
get_encoding(encoding, harm = 1)

