MFPCA {MFPCA}R Documentation

Multivariate functional principal component analysis for functions on different (dimensional) domains

Description

This function calculates a multivariate functional principal component analysis (MFPCA) based on i.i.d. observations x_1, \ldots, x_N of a multivariate functional data-generating process X = (X^{(1)}, \ldots X^{(p)}) with elements X^{(j)} \in L^2(\mathcal{T}_j) defined on a domain \mathcal{T}_j \subset IR^{d_j}. In particular, the elements can be defined on different (dimensional) domains. The results contain the mean function, the estimated multivariate functional principal components \hat \psi_1, \ldots, \hat \psi_M (having the same structure as x_i), the associated eigenvalues \hat \nu_1 \geq \ldots \geq \hat \nu_M > 0 and the individual scores \hat \rho_{im} = \widehat{<x_i, \psi_m>}. Moreover, estimated trajectories for each observation based on the truncated Karhunen-Loeve representation

\hat x_i = \sum_{m = 1}^M \hat \rho_{im} \hat \psi_m

are given if desired (fit = TRUE). The implementation of the observations x_i = (x_i^{(1)}, \ldots , x_i^{(p)}),~ i = 1 , \ldots, N, the mean function and multivariate functional principal components \hat \psi_1, \ldots, \hat \psi_M uses the multiFunData class, which is defined in the package funData.

Usage

MFPCA(
  mFData,
  M,
  uniExpansions,
  weights = rep(1, length(mFData)),
  fit = FALSE,
  approx.eigen = FALSE,
  bootstrap = FALSE,
  nBootstrap = NULL,
  bootstrapAlpha = 0.05,
  bootstrapStrat = NULL,
  verbose = options()$verbose
)

Arguments

mFData

A multiFunData object containing the N observations.

M

The number of multivariate functional principal components to calculate.

uniExpansions

A list characterizing the (univariate) expansion that is calculated for each element. See Details.

weights

An optional vector of weights, defaults to 1 for each element. See Details.

fit

Logical. If TRUE, a truncated multivariate Karhunen-Loeve representation for the data is calculated based on the estimated scores and eigenfunctions.

approx.eigen

Logical. If TRUE, the eigenanalysis problem for the estimated covariance matrix is solved approximately using the irlba package, which is much faster. If the number M of eigenvalues to calculate is high with respect to the number of observations in mFData or the number of estimated univariate eigenfunctions, the approximation may be inappropriate. In this case, approx.eigen is set to FALSE and the function throws a warning. Defaults to FALSE.

bootstrap

Logical. If TRUE, pointwise bootstrap confidence bands are calculated for the multivariate functional principal components. Defaults to FALSE. See Details.

nBootstrap

The number of bootstrap iterations to use. Defaults to NULL, which leads to an error, if bootstrap = TRUE.

bootstrapAlpha

A vector of numerics (or a single number) giving the significance level for bootstrap intervals. Defaults to 0.05.

bootstrapStrat

A stratification variable for bootstrap. Must be a factor of length nObs(mFData) or NULL (default). If NULL, no stratification is made in the bootstrap resampling, i.e. the curves are sampled with replacement. If bootstrapStrat is not NULL, the curves are resampled with replacement within the groups defined by bootstrapStrat, hence keeping the group proportions fixed.

verbose

Logical. If TRUE, the function reports extra-information about the progress (incl. timestamps). Defaults to options()$verbose.

Details

Weighted MFPCA

If the elements vary considerably in domain, range or variation, a weight vector w_1 , \ldots, w_p can be supplied and the MFPCA is based on the weighted scalar product

<<f,g>>_w = \sum_{j = 1}^p w_j \int_{\mathcal{T}_j} f^{(j)}(t) g^{(j)}(t) \mathrm{d} t

and the corresponding weighted covariance operator \Gamma_w.

Bootstrap

If bootstrap = TRUE, pointwise bootstrap confidence bands are generated for the multivariate eigenvalues \hat \nu_1, \ldots, \hat \nu_M as well as for multivariate functional principal components \hat \psi_1, \ldots, \hat \psi_M. The parameter nBootstrap gives the number of bootstrap iterations. In each iteration, the observations are resampled on the level of (multivariate) functions and the whole MFPCA is recalculated. In particular, if the univariate basis depends on the data (FPCA approaches), basis functions and scores are both re-estimated. If the basis functions are fixed (e.g. splines), the scores from the original estimate are used to speed up the calculations. The confidence bands for the eigenfunctions are calculated separately for each element as pointwise percentile bootstrap confidence intervals. Analogously, the confidence bands for the eigenvalues are also percentile bootstrap confidence bands. The significance level(s) can be defined by the bootstrapAlpha parameter, which defaults to 5%. As a result, the MFPCA function returns a list CI of the same length as bootstrapAlpha, containing the lower and upper bounds of the confidence bands for the principal components as multiFunData objects of the same structure as mFData. The confidence bands for the eigenvalues are returned in a list CIvalues, containing the upper and lower bounds for each significance level.

Univariate Expansions

The multivariate functional principal component analysis relies on a univariate basis expansion for each element X^{(j)}. The univariate basis representation is calculated using the univDecomp function, that passes the univariate functional observations and optional parameters to the specific function. The univariate decompositions are specified via the uniExpansions argument in the MFPCA function. It is a list of the same length as the mFData object, i.e. having one entry for each element of the multivariate functional data. For each element, uniExpansion must specify at least the type of basis functions to use. Additionally, one may add further parameters. The following basis representations are supported:

See univDecomp and univExpansion for details.

Value

An object of class MFPCAfit containing the following components:

values

A vector of estimated eigenvalues \hat \nu_1 , \ldots , \hat \nu_M.

functions

A multiFunData object containing the estimated multivariate functional principal components \hat \psi_1, \ldots, \hat \psi_M.

scores

A matrix of dimension N x M containing the estimated scores \hat \rho_{im}.

vectors

A matrix representing the eigenvectors associated with the combined univariate score vectors. This might be helpful for calculating predictions.

normFactors

The normalizing factors used for calculating the multivariate eigenfunctions and scores. This might be helpful when calculation predictions.

meanFunction

A multivariate functional data object, corresponding to the mean function. The MFPCA is applied to the de-meaned functions in mFData.

fit

A multiFunData object containing estimated trajectories for each observation based on the truncated Karhunen-Loeve representation and the estimated scores and eigenfunctions.

CI

A list of the same length as bootstrapAlpha, containing the pointwise lower and upper bootstrap confidence bands for each eigenfunction and each significance level in form of multiFunData objects (only if bootstrap = TRUE).

CIvalues

A list of the same length as bootstrapAlpha, containing the lower and upper bootstrap confidence bands for each eigenvalue and each significance level (only if bootstrap = TRUE).

References

C. Happ, S. Greven (2018): Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains. Journal of the American Statistical Association, 113(522): 649-659. DOI: doi:10.1080/01621459.2016.1273115

C. Happ-Kurz (2020): Object-Oriented Software for Functional Data. Journal of Statistical Software, 93(5): 1-38. DOI: doi:10.18637/jss.v093.i05

See Also

See Happ-Kurz (2020. doi:10.18637/jss.v093.i05) for a general introduction to the funData package and it's interplay with MFPCA. This file also includes a case study on how to use MFPCA. Useful functions: multiFunData, PACE, univDecomp, univExpansion, summary, plot, scoreplot

Examples

oldPar <- par(no.readonly = TRUE)

set.seed(1)

### simulate data (one-dimensional domains)
sim <-  simMultiFunData(type = "split", argvals = list(seq(0,1,0.01), seq(-0.5,0.5,0.02)),
                        M = 5, eFunType = "Poly", eValType = "linear", N = 100)

# MFPCA based on univariate FPCA
uFPCA <- MFPCA(sim$simData, M = 5, uniExpansions = list(list(type = "uFPCA"),
                                                                  list(type = "uFPCA")))
summary(uFPCA)
plot(uFPCA) # plot the eigenfunctions as perturbations of the mean
scoreplot(uFPCA) # plot the scores

# MFPCA based on univariate spline expansions
splines <- MFPCA(sim$simData, M = 5, uniExpansions = list(list(type = "splines1D", k = 10),
                                                          list(type = "splines1D", k = 10)),
                 fit = TRUE) # calculate reconstruction, too
summary(splines)
plot(splines) # plot the eigenfunctions as perturbations of the mean
scoreplot(splines) # plot the scores

### Compare estimates to true eigenfunctions
# flip to make results more clear
uFPCA$functions <- flipFuns(sim$trueFuns, uFPCA$functions)
splines$functions <- flipFuns(sim$trueFuns, splines$functions)

par(mfrow = c(1,2))
plot(sim$trueFuns[[1]], main = "Eigenfunctions\n1st Element", lwd = 2)
plot(uFPCA$functions[[1]], lty = 2, add = TRUE)
plot(splines$functions[[1]], lty = 3, add = TRUE)

plot(sim$trueFuns[[2]], main = "Eigenfunctions\n2nd Element", lwd = 2)
plot(uFPCA$functions[[2]], lty = 2, add = TRUE)
plot(splines$functions[[2]], lty = 3, add = TRUE)
legend("bottomleft", c("True", "uFPCA", "splines"), lty = 1:3, lwd = c(2,1,1))

# Test reconstruction for the first 10 observations
plot(sim$simData[[1]], obs = 1:10, main = "Reconstruction\n1st Element", lwd = 2)
plot(splines$fit[[1]], obs = 1:10, lty = 2, col = 1, add = TRUE)

plot(sim$simData[[2]], obs = 1:10, main = "Reconstruction\n2nd Element", lwd = 2)
plot(splines$fit[[2]], obs = 1:10, lty = 2, col = 1, add = TRUE)
legend("bottomleft", c("True", "Reconstruction"), lty = c(1,2), lwd = c(2,1))

# MFPCA with Bootstrap-CI for the first 2 eigenfunctions
### ATTENTION: Takes long

splinesBoot <- MFPCA(sim$simData, M = 2, uniExpansions = list(list(type = "splines1D", k = 10),
                                                          list(type = "splines1D", k = 10)),
                 bootstrap = TRUE, nBootstrap = 100, bootstrapAlpha = c(0.05, 0.1), verbose = TRUE)
summary(splinesBoot)
                                 
plot(splinesBoot$functions[[1]], ylim = c(-2,1.5))
plot(splinesBoot$CI$alpha_0.05$lower[[1]], lty = 2, add = TRUE)
plot(splinesBoot$CI$alpha_0.05$upper[[1]], lty = 2, add = TRUE)
plot(splinesBoot$CI$alpha_0.1$lower[[1]], lty = 3, add = TRUE)
plot(splinesBoot$CI$alpha_0.1$upper[[1]], lty = 3, add = TRUE)
abline(h = 0, col = "gray")
 
plot(splinesBoot$functions[[2]], ylim = c(-1,2.5))
plot(splinesBoot$CI$alpha_0.05$lower[[2]], lty = 2, add = TRUE)
plot(splinesBoot$CI$alpha_0.05$upper[[2]], lty = 2, add = TRUE)
plot(splinesBoot$CI$alpha_0.1$lower[[2]], lty = 3, add = TRUE)
plot(splinesBoot$CI$alpha_0.1$upper[[2]], lty = 3, add = TRUE)
abline(h = 0, col = "gray")
legend("topleft", c("Estimate", "95% CI", "90% CI"), lty = 1:3, lwd = c(2,1,1))

# Plot 95% confidence bands for eigenvalues
plot(1:2, splinesBoot$values, pch = 20, ylim = c(0, 1.5), 
     main = "Estimated eigenvalues with 95% CI",
     xlab = "Eigenvalue no.", ylab = "")
arrows(1:2, splinesBoot$CIvalues$alpha_0.05$lower,
       1:2, splinesBoot$CIvalues$alpha_0.05$upper,
       length = 0.05, angle = 90, code = 3)
points(1:2, sim$trueVals[1:2], pch = 20, col = 4)
legend("topright", c("Estimate", "True value"), pch = 20, col = c(1,4))


### simulate data (two- and one-dimensional domains)
### ATTENTION: Takes long

set.seed(2)
sim <-  simMultiFunData(type = "weighted",
                 argvals = list(list(seq(0,1,0.01), seq(-1,1,0.02)), list(seq(-0.5,0.5,0.01))),
                 M = list(c(4,5), 20), eFunType = list(c("Fourier", "Fourier"), "Poly"),
                 eValType = "exponential", N = 150)

# MFPCA based on univariate spline expansions (for images) and univariate FPCA (for functions)
pca <- MFPCA(sim$simData, M = 10,
             uniExpansions = list(list(type = "splines2D", k = c(10,12)),
                             list(type = "uFPCA")))
summary(pca)
plot(pca) # plot the eigenfunctions as perturbations of the mean
scoreplot(pca) # plot the scores

### Compare to true eigenfunctions
# flip to make results more clear
pca$functions <- flipFuns(sim$trueFuns[1:10], pca$functions)

par(mfrow = c(5,2), mar = rep(2,4))
for(m in 2:6) # for m = 1, image.plot (used in plot(funData)) produces an error...
{
  plot(sim$trueFuns[[1]], main = paste("True, m = ", m), obs = m)
  plot(pca$functions[[1]], main = paste("Estimate, m = ", m), obs = m)
}

par(mfrow = c(1,1))
plot(sim$trueFuns[[2]], main = "Eigenfunctions (2nd element)", lwd = 2, obs=  1:5)
plot(pca$functions[[2]], lty = 2, add = TRUE, obs=  1:5)
legend("bottomleft", c("True", "MFPCA"), lty = 1:2, lwd = c(2,1))

par(oldPar)

[Package MFPCA version 1.3-10 Index]