FPCA {fdapace} | R Documentation |
Functional Principal Component Analysis
Description
FPCA for dense or sparse functional data.
Usage
FPCA(Ly, Lt, optns = list())
Arguments
Ly |
A list of n vectors containing the observed values for each individual. Missing values specified by |
Lt |
A list of n vectors containing the observation time points for each individual corresponding to y. Each vector should be sorted in ascending order. |
optns |
A list of options control parameters specified by |
Details
If the input is sparse data, make sure you check the design plot is dense and the 2D domain is well covered
by support points, using plot
or CreateDesignPlot
. Some study design such as snippet data (where each subject is
observed only on a sub-interval of the period of study) will have an ill-covered design plot, in which case the nonparametric
covariance estimate will be unreliable.
WARNING! Slow computation times may occur if the dataType argument is incorrect. If FPCA is taking a while, please double check that a dense design is not mistakenly coded as 'Sparse'. Applying FPCA to a mixture of very dense and sparse curves may result in computational issues.
Available control options are
- userBwCov
The bandwidth value for the smoothed covariance function; positive numeric - default: determine automatically based on 'methodBwCov'
- methodBwCov
The bandwidth choice method for the smoothed covariance function; 'GMeanAndGCV' (the geometric mean of the GCV bandwidth and the minimum bandwidth),'CV','GCV' - default: 10% of the support
- userBwMu
The bandwidth value for the smoothed mean function (using 'CV' or 'GCV'); positive numeric - default: determine automatically based on 'methodBwMu'
- methodBwMu
The bandwidth choice method for the mean function; 'GMeanAndGCV' (the geometric mean of the GCV bandwidth and the minimum bandwidth),'CV','GCV' - default: 5% of the support
- dataType
The type of design we have (usually distinguishing between sparse or dense functional data); 'Sparse', 'Dense', 'DenseWithMV', 'p>>n' - default: determine automatically based on 'IsRegular'
- diagnosticsPlot
Deprecated. Same as the option 'plot'
- plot
Plot FPCA results (design plot, mean, scree plot and first K (<=3) eigenfunctions); logical - default: FALSE
- error
Assume measurement error in the dataset; logical - default: TRUE
- fitEigenValues
Whether also to obtain a regression fit of the eigenvalues - default: FALSE
- FVEthreshold
Fraction-of-Variance-Explained threshold used during the SVD of the fitted covariance function; numeric (0,1] - default: 0.99
- FVEfittedCov
Fraction-of-Variance explained by the components that are used to construct fittedCov; numeric (0,1] - default: NULL (all components available will be used)
- kernel
Smoothing kernel choice, common for mu and covariance; "rect", "gauss", "epan", "gausvar", "quar" - default: "gauss"; dense data are assumed noise-less so no smoothing is performed.
- kFoldMuCov
The number of folds to be used for mean and covariance smoothing. Default: 10
- lean
If TRUE the 'inputData' field in the output list is empty. Default: FALSE
- maxK
The maximum number of principal components to consider - default: min(20, N-2,nRegGrid-2), N:# of curves, nRegGrid:# of support points in each direction of covariance surface
- methodXi
The method to estimate the PC scores; 'CE' (Conditional Expectation), 'IN' (Numerical Integration) - default: 'CE' for sparse data and dense data with missing values, 'IN' for dense data. If time points are irregular but spacing is small enough, 'IN' method is utilized by default.
- methodMuCovEst
The method to estimate the mean and covariance in the case of dense functional data; 'cross-sectional', 'smooth' - default: 'cross-sectional'
- nRegGrid
The number of support points in each direction of covariance surface; numeric - default: 51
- numBins
The number of bins to bin the data into; positive integer > 10, default: NULL
- methodSelectK
The method of choosing the number of principal components K; 'FVE','AIC','BIC', or a positive integer as specified number of components: default 'FVE')
- shrink
Whether to use shrinkage method to estimate the scores in the dense case (see Yao et al 2003) - default FALSE
- outPercent
A 2-element vector in [0,1] indicating the percentages of the time range to be considered as left and right boundary regions of the time window of observation - default (0,1) which corresponds to no boundary
- methodRho
The method of regularization (add to diagonal of covariance surface) in estimating principal component scores; 'trunc': rho is truncation of sigma2, 'ridge': rho is a ridge parameter, 'vanilla': vanilla approach - default "vanilla".
- rotationCut
The 2-element vector in [0,1] indicating the percent of data truncated during sigma^2 estimation; default (0.25, 0.75))
- useBinnedData
Should the data be binned? 'FORCE' (Enforce the # of bins), 'AUTO' (Select the # of bins automatically), 'OFF' (Do not bin) - default: 'AUTO'
- useBinnedCov
Whether to use the binned raw covariance for smoothing; logical - default:TRUE
- usergrid
Whether to use observation grid for fitting, if false will use equidistant grid. logical - default:FALSE
- userCov
The user-defined smoothed covariance function; list of two elements: numerical vector 't' and matrix 'cov', 't' must cover the support defined by 'Ly' - default: NULL
- userMu
The user-defined smoothed mean function; list of two numerical vector 't' and 'mu' of equal size, 't' must cover the support defined 'Ly' - default: NULL
- userSigma2
The user-defined measurement error variance. A positive scalar. If specified then the vanilla approach is used (methodRho is set to 'vanilla', unless specified otherwise). Default to 'NULL'
- userRho
The user-defined measurement truncation threshold used for the calculation of functional principal components scores. A positive scalar. Default to 'NULL'
- useBW1SE
Pick the largest bandwidth such that CV-error is within one Standard Error from the minimum CV-error, relevant only if methodBwMu ='CV' and/or methodBwCov ='CV'; logical - default: FALSE
- imputeScores
Whether to impute the FPC scores or not; default: 'TRUE'
- verbose
Display diagnostic messages; logical - default: FALSE
Value
A list containing the following fields:
sigma2 |
Variance for measurement error. |
lambda |
A vector of length K containing eigenvalues. |
phi |
An nWorkGrid by K matrix containing eigenfunctions, supported on workGrid. |
xiEst |
A n by K matrix containing the FPC estimates. |
xiVar |
A list of length n, each entry containing the variance estimates for the FPC estimates. |
obsGrid |
The (sorted) grid points where all observation points are pooled. |
mu |
A vector of length nWorkGrid containing the mean function estimate. |
workGrid |
A vector of length nWorkGrid. The internal regular grid on which the eigen analysis is carried on. |
smoothedCov |
A nWorkGrid by nWorkGrid matrix of the smoothed covariance surface. |
fittedCov |
A nWorkGrid by nWorkGrid matrix of the fitted covariance surface, which is guaranteed to be non-negative definite. |
fittedCorr |
A nWorkGrid by nWorkGrid matrix of the fitted correlation surface computed from fittedCov. |
optns |
A list of actually used options. |
timings |
A vector with execution times for the basic parts of the FPCA call. |
bwMu |
The selected (or user specified) bandwidth for smoothing the mean function. |
bwCov |
The selected (or user specified) bandwidth for smoothing the covariance function. |
rho |
A regularizing scalar for the measurement error variance estimate. |
cumFVE |
A vector with the fraction of the cumulative total variance explained with each additional FPC. |
FVE |
A fraction indicating the total variance explained by chosen FPCs with corresponding 'FVEthreshold'. |
selectK |
Number K of selected components. |
criterionValue |
A scalar specifying the criterion value obtained by the selected number of components with specific methodSelectK: FVE, AIC, BIC values or NULL for fixed K. |
inputData |
A list containing the original 'Ly' and 'Lt' lists used as inputs to FPCA. NULL if 'lean' was specified to be TRUE. |
References
Yao, F., Müller, H.G., Clifford, A.J., Dueker, S.R., Follett, J., Lin, Y., Buchholz, B., Vogel, J.S. (2003). "Shrinkage estimation for functional principal component scores, with application to the population kinetics of plasma folate." Biometrics 59, 676-685. (Shrinkage estimates for dense data)
Yao, Fang, Müller, Hans-Georg and Wang, Jane-Ling (2005). "Functional data analysis for sparse longitudinal data." Journal of the American Statistical Association 100, no. 470 577-590. (Sparse data FPCA)
Liu, Bitao and Müller, Hans-Georg (2009). "Estimating derivatives for samples of sparsely observed functions, with application to online auction dynamics." Journal of the American Statistical Association 104, no. 486 704-717. (Sparse data FPCA)
Castro, P. E., Lawton, W.H. and Sylvestre, E.A. (1986). "Principal modes of variation for processes with continuous sample curves." Technometrics 28, no. 4, 329-337. (modes of variation for dense data FPCA)
Examples
set.seed(1)
n <- 20
pts <- seq(0, 1, by=0.05)
sampWiener <- Wiener(n, pts)
sampWiener <- Sparsify(sampWiener, pts, 10)
res <- FPCA(sampWiener$Ly, sampWiener$Lt,
list(dataType='Sparse', error=FALSE, kernel='epan', verbose=TRUE))
plot(res) # The design plot covers [0, 1] * [0, 1] well.
CreateCovPlot(res, 'Fitted')
CreateCovPlot(res, corr = TRUE)