gfpca_twoStep {registr} | R Documentation |
Generalized functional principal component analysis
Description
Function for applying FPCA to different exponential family distributions.
Used in the FPCA step for registering functional data,
called by register_fpca
when fpca_type = "two-step"
.
The method implements the 'two-step approach' of Gertheiss et al. (2017)
and is based on the approach of Hall et al. (2008) to estimate functional
principal components.
The number of functional principal components (FPCs) can either be specified
directly (argument npc
) or chosen based on the explained share of
variance (npc_criterion
). Using the latter, we approximate the overall
variance in the data Y
with the variance represented by the smoothed
covariance surface estimated with cov_hall
.
Note that the Eigenvalue decomposition of this covariance surface
sometimes leads to a long tail of subordinate FPCs with small eigenvalues.
Such subordinate dimensions seem to often represent phase rather than
amplitude variation, and can be cut off by specifying the second element of
argument npc_criterion
.
This function is an adaptation of the implementation of Jan
Gertheiss for Gertheiss et al. (2017), with focus on higher (RAM) efficiency
for large data settings.
Usage
gfpca_twoStep(
Y,
family = "gaussian",
npc = NULL,
npc_criterion = NULL,
Kt = 8,
t_min = NULL,
t_max = NULL,
row_obj = NULL,
index_significantDigits = 4L,
estimation_accuracy = "high",
start_params = NULL,
periodic = FALSE,
verbose = 1,
...
)
Arguments
Y |
Dataframe. Should have values id, value, index. |
family |
One of |
npc , npc_criterion |
The number of functional principal components (FPCs)
has to be specified either directly as |
Kt |
Number of B-spline basis functions used to estimate mean functions and functional principal components. Default is 8. |
t_min |
Minimum value to be evaluated on the time domain. |
t_max |
Maximum value to be evaluated on the time domain. |
row_obj |
If NULL, the function cleans the data and calculates row indices.
Keep this NULL if you are using standalone |
index_significantDigits |
Positive integer |
estimation_accuracy |
One of |
start_params |
Optional start values for gamm4. Not used if
|
periodic |
Only contained for full consistency with |
verbose |
Can be set to integers between 0 and 4 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1. |
... |
Additional arguments passed to |
Details
For family = "poisson"
the values in Y
are rounded before
performing the GFPCA to ensure integer data. This is done to ensure reasonable
computation times. Computation times tend to explode when estimating the
underlying high-dimensional mixed model with continuous Poisson data based
on the gamm4
package.
If negative eigenvalues are present, the respective eigenfunctions are dropped and not considered further.
Value
An object of class fpca
containing:
fpca_type |
Information that FPCA was performed with the 'two-step' approach, in contrast to registr::fpca_gauss or registr::bfpca. |
t_vec |
Time vector over which the mean |
knots |
Cutpoints for B-spline basis used to rebuild |
efunctions |
|
evalues |
Estimated variance of the FPC scores. |
evalues_sum |
Sum of all (nonnegative) eigenvalues of the smoothed
covariance surface estimated with |
npc |
number of FPCs. |
scores |
|
alpha |
Estimated population-level mean. |
mu |
Estimated population-level mean. Same value as |
subject_coefs |
Always |
Yhat |
FPC approximation of subject-specific means, before applying the response function. |
Y |
The observed data. |
family |
|
gamm4_theta |
Estimated parameters of the mixed model. |
Author(s)
Alexander Bauer alexander.bauer@stat.uni-muenchen.de, based on work of Jan Gertheiss
References
Gertheiss, J., Goldsmith, J., & Staicu, A. M. (2017). A note on modeling sparse exponential-family functional response curves. Computational statistics & data analysis, 105, 46–52.
Hall, P., Müller, H. G., & Yao, F. (2008). Modelling sparse generalized longitudinal observations with latent Gaussian processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(4), 703–723.
Examples
data(growth_incomplete)
# estimate 2 FPCs
fpca_obj = gfpca_twoStep(Y = growth_incomplete, npc = 2, family = "gaussian")
plot(fpca_obj)
# estimate npc adaptively, to explain 90% of the overall variation
fpca_obj2 = gfpca_twoStep(Y = growth_incomplete, npc_criterion = 0.9, family = "gaussian")
plot(fpca_obj2, plot_FPCs = 1:2)