fpca.lfda {refund} | R Documentation |
Longitudinal Functional Data Analysis using FPCA
Description
Implements longitudinal functional data analysis (Park and Staicu, 2015). It decomposes longitudinally-observed functional observations in two steps. It first applies FPCA on a properly defined marginal covariance function and obtain estimated scores (mFPCA step). Then it further models the underlying process dynamics by applying another FPCA on a covariance of the estimated scores obtained in the mFPCA step. The function also allows to use a random effects model to study the underlying process dynamics instead of a KL expansion model in the second step. Scores in mFPCA step are estimated using numerical integration. Scores in sFPCA step are estimated under a mixed model framework.
Usage
fpca.lfda(
Y,
subject.index,
visit.index,
obsT = NULL,
funcArg = NULL,
numTEvalPoints = 41,
newdata = NULL,
fbps.knots = c(5, 10),
fbps.p = 3,
fbps.m = 2,
mFPCA.pve = 0.95,
mFPCA.knots = 35,
mFPCA.p = 3,
mFPCA.m = 2,
mFPCA.npc = NULL,
LongiModel.method = c("fpca.sc", "lme"),
sFPCA.pve = 0.95,
sFPCA.nbasis = 10,
sFPCA.npc = NULL,
gam.method = "REML",
gam.kT = 10
)
Arguments
Y |
a matrix of which each row corresponds to one curve observed on a regular and dense grid (dimension of N by m; N = total number of observed functions; m = number of grid points) |
subject.index |
subject id; vector of length N with each element corresponding a row of Y |
visit.index |
index for visits (repeated measures); vector of length N with each element corresponding a row of Y |
obsT |
actual time of visits at which a function is observed; vector of length N with each element corresponding a row of Y |
funcArg |
numeric; function argument |
numTEvalPoints |
total number of evaluation time points for visits; used for pre-binning in sFPCA step; defaults to 41 |
newdata |
an optional data frame providing predictors (i for subject id / Ltime for visit time) with which prediction is desired; defaults to NULL |
fbps.knots |
list of two vectors of knots or number of equidistanct knots for all dimensions
for a fast bivariate P-spline smoothing (fbps) method used to estimate a bivariate, smooth mean function; defaults to c(5,10); see |
fbps.p |
integer;degrees of B-spline functions to use for a fbps method; defaults to 3; see |
fbps.m |
integer;order of differencing penalty to use for a fbps method; defaults to 2; see |
mFPCA.pve |
proportion of variance explained for a mFPCA step; used to choose the number of principal components (PCs); defaults to 0.95; see |
mFPCA.knots |
number of knots to use or the vectors of knots in a mFPCA step; used for obtain a smooth estimate of a covariance function; defaults to 35; see |
mFPCA.p |
integer; the degree of B-spline functions to use in a mFPCA step; defaults to 3; see |
mFPCA.m |
integer;order of differencing penalty to use in a mFPCA step; defaults to 2; see |
mFPCA.npc |
pre-specified value for the number of principal components; if given, it overrides |
LongiModel.method |
model and estimation method for estimating covariance of estimated scores from a mFPCA step; either KL expansion model or random effects model; defaults to fpca.sc |
sFPCA.pve |
proportion of variance explained for sFPCA step; used to choose the number of principal components; defaults to 0.95; see |
sFPCA.nbasis |
number of B-spline basis functions used in sFPCA step for estimation of the mean function and bivariate smoothing of the covariance surface; defaults to 10; see |
sFPCA.npc |
pre-specified value for the number of principal components; if given, it overrides |
gam.method |
smoothing parameter estimation method when |
gam.kT |
dimension of basis functions to use; see |
Value
A list with components
obsData |
observed data (input) |
i |
subject id |
funcArg |
function argument |
visitTime |
visit times |
fitted.values |
fitted values (in-sample); of the same dimension as Y |
fitted.values.all |
a list of which each component consists of a subject's fitted values at all pairs of evaluation points (s and T) |
predicted.values |
predicted values for variables provided in newdata |
bivariateSmoothMeanFunc |
estimated bivariate smooth mean function |
mFPCA.efunctions |
estimated eigenfunction in a mFPCA step |
mFPCA.evalues |
estimated eigenvalues in a mFPCA step |
mFPCA.npc |
number of principal components selected with pre-specified pve in a mFPCA step |
mFPCA.scree.eval |
estimated eigenvalues obtained with pre-specified pve = 0.9999; for scree plot |
sFPCA.xiHat.bySubj |
a list of which each component consists of a subject's predicted score functions evaluated at equidistanced grid in direction of visit time, T |
sFPCA.npc |
a vector of numbers of principal components selected in a sFPCA step with pre-specified pve; length of mFPCA.npc |
mFPCA.covar |
estimated marginal covariance |
sFPCA.longDynCov.k |
a list of estimated covariance of score function; length of mFPCA.npc |
Details
A random effects model is recommended when a set of visit times for all subjects and visits is not dense in its range.
Author(s)
So Young Park spark13@ncsu.edu, Ana-Maria Staicu
References
Park, S.Y. and Staicu, A.M. (2015). Longitudinal functional data analysis. Stat 4 212-226.
Examples
## Not run:
########################################
### Illustration with real data ###
########################################
data(DTI)
MS <- subset(DTI, case ==1) # subset data with multiple sclerosis (MS) case
index.na <- which(is.na(MS$cca))
Y <- MS$cca; Y[index.na] <- fpca.sc(Y)$Yhat[index.na]; sum(is.na(Y))
id <- MS$ID
visit.index <- MS$visit
visit.time <- MS$visit.time/max(MS$visit.time)
lfpca.dti <- fpca.lfda(Y = Y, subject.index = id,
visit.index = visit.index, obsT = visit.time,
LongiModel.method = 'lme',
mFPCA.pve = 0.95)
TT <- seq(0,1,length.out=41); ss = seq(0,1,length.out=93)
# estimated mean function
persp(x = ss, y = TT, z = t(lfpca.dti$bivariateSmoothMeanFunc),
xlab="s", ylab="visit times", zlab="estimated mean fn", col='light blue')
# first three estimated marginal eigenfunctions
matplot(ss, lfpca.dti$mFPCA.efunctions[,1:3], type='l', xlab='s', ylab='estimated eigen fn')
# predicted scores function corresponding to first two marginal PCs
matplot(TT, do.call(cbind, lapply(lfpca.dti$sFPCA.xiHat.bySubj, function(a) a[,1])),
xlab="visit time (T)", ylab="xi_hat(T)", main = "k = 1", type='l')
matplot(TT, do.call(cbind, lapply(lfpca.dti$sFPCA.xiHat.bySubj, function(a) a[,2])),
xlab="visit time (T)", ylab="xi_hat(T)", main = "k = 2", type='l')
# prediction of cca of first two subjects at T = 0, 0.5 and 1 (black, red, green)
matplot(ss, t(lfpca.dti$fitted.values.all[[1]][c(1,21,41),]),
type='l', lty = 1, ylab="", xlab="s", main = "Subject = 1")
matplot(ss, t(lfpca.dti$fitted.values.all[[2]][c(1,21,41),]),
type='l', lty = 1, ylab="", xlab="s", main = "Subject = 2")
########################################
### Illustration with simulated data ###
########################################
###########################################################################################
# data generation
###########################################################################################
set.seed(1)
n <- 100 # number of subjects
ss <- seq(0,1,length.out=101)
TT <- seq(0, 1, length.out=41)
mi <- runif(n, min=6, max=15)
ij <- sapply(mi, function(a) sort(sample(1:41, size=a, replace=FALSE)))
# error variances
sigma <- 0.1
sigma_wn <- 0.2
lambdaTrue <- c(1,0.5) # True eigenvalues
eta1True <- c(0.5, 0.5^2, 0.5^3) # True eigenvalues
eta2True <- c(0.5^2, 0.5^3) # True eigenvalues
phi <- sqrt(2)*cbind(sin(2*pi*ss),cos(2*pi*ss))
psi1 <- cbind(rep(1,length(TT)), sqrt(3)*(2*TT-1), sqrt(5)*(6*TT^2-6*TT+1))
psi2 <- sqrt(2)*cbind(sin(2*pi*TT),cos(2*pi*TT))
zeta1 <- sapply(eta1True, function(a) rnorm(n = n, mean = 0, sd = a))
zeta2 <- sapply(eta2True, function(a) rnorm(n = n, mean = 0, sd = a))
xi1 <- unlist(lapply(1:n, function(a) (zeta1 %*% t(psi1))[a,ij[[a]]] ))
xi2 <- unlist(lapply(1:n, function(a) (zeta2 %*% t(psi2))[a,ij[[a]]] ))
xi <- cbind(xi1, xi2)
Tij <- unlist(lapply(1:n, function(i) TT[ij[[i]]] ))
i <- unlist(lapply(1:n, function(i) rep(i, length(ij[[i]]))))
j <- unlist(lapply(1:n, function(i) 1:length(ij[[i]])))
X <- xi %*% t(phi)
meanFn <- function(s,t){ 0.5*t + 1.5*s + 1.3*s*t}
mu <- matrix(meanFn(s = rep(ss, each=length(Tij)), t=rep(Tij, length(ss)) ) , nrow=nrow(X))
Y <- mu + X +
matrix(rnorm(nrow(X)*ncol(phi), 0, sigma), nrow=nrow(X)) %*% t(phi) + #correlated error
matrix(rnorm(length(X), 0, sigma_wn), nrow=nrow(X)) # white noise
matplot(ss, t(Y[which(i==2),]), type='l', ylab="", xlab="functional argument",
main="observations from subject i = 2")
# END: data generation
###########################################################################################
# Illustration I : when covariance of scores from a mFPCA step is estimated using fpca.sc
###########################################################################################
est <- fpca.lfda(Y = Y,
subject.index = i, visit.index = j, obsT = Tij,
funcArg = ss, numTEvalPoints = length(TT),
newdata = data.frame(i = c(1:3), Ltime = c(Tij[1], 0.2, 0.5)),
fbps.knots = 35, fbps.p = 3, fbps.m = 2,
LongiModel.method='fpca.sc',
mFPCA.pve = 0.95, mFPCA.knots = 35, mFPCA.p = 3, mFPCA.m = 2,
sFPCA.pve = 0.95, sFPCA.nbasis = 10, sFPCA.npc = NULL,
gam.method = 'REML', gam.kT = 10)
# mean function (true vs. estimated)
par(mfrow=c(1,2))
persp(x=TT, y = ss, z= t(sapply(TT, function(a) meanFn(s=ss, t = a))),
xlab="visit times", ylab="s", zlab="true mean fn")
persp(x = TT, y = ss, est$bivariateSmoothMeanFunc,
xlab="visit times", ylab="s", zlab="estimated mean fn", col='light blue')
################ mFPCA step ################
par(mfrow=c(1,2))
# marginal covariance fn (true vs. estimated)
image(phi%*%diag(lambdaTrue)%*%t(phi))
image(est$mFPCA.covar)
# eigenfunctions (true vs. estimated)
matplot(ss, phi, type='l')
matlines(ss, cbind(est$mFPCA.efunctions[,1], est$mFPCA.efunctions[,2]), type='l', lwd=2)
# scree plot
plot(cumsum(est$mFPCA.scree.eval)/sum(est$mFPCA.scree.eval), type='l',
ylab = "Percentage of variance explained")
points(cumsum(est$mFPCA.scree.eval)/sum(est$mFPCA.scree.eval), pch=16)
################ sFPCA step ################
par(mfrow=c(1,2))
print(est$mFPCA.npc) # k = 2
# covariance of score functions for k = 1 (true vs. estimated)
image(psi1%*%diag(eta1True)%*%t(psi1), main='TRUE')
image(est$sFPCA.longDynCov.k[[1]], main='ESTIMATED')
# covariance of score functions for k = 2 (true vs. estimated)
image(psi2%*%diag(eta2True)%*%t(psi2))
image(est$sFPCA.longDynCov.k[[2]])
# estimated scores functions
matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,1])),
xlab="visit time", main="k=1", type='l', ylab="", col=rainbow(100, alpha = 1),
lwd=1, lty=1)
matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,2])),
xlab="visit time", main="k=2",type='l', ylab="", col=rainbow(100, alpha = 1),
lwd=1, lty=1)
################ In-sample and Out-of-sample Prediction ################
par(mfrow=c(1,2))
# fitted
matplot(ss, t(Y[which(i==1),]), type='l', ylab="", xlab="functional argument")
matlines(ss, t(est$fitted.values[which(i==1),]), type='l', lwd=2)
# sanity check : expect fitted and predicted (obtained using info from newdata)
# values to be the same
plot(ss, est$fitted.values[1,], type='p', xlab="", ylab="", pch = 1, cex=1)
lines(ss, est$predicted.values[1,], type='l', lwd=2, col='blue')
all.equal(est$predicted.values[1,], est$fitted.values[1,])
###########################################################################################
# Illustration II : when covariance of scores from a mFPCA step is estimated using lmer
###########################################################################################
est.lme <- fpca.lfda(Y = Y,
subject.index = i, visit.index = j, obsT = Tij,
funcArg = ss, numTEvalPoints = length(TT),
newdata = data.frame(i = c(1:3), Ltime = c(Tij[1], 0.2, 0.5)),
fbps.knots = 35, fbps.p = 3, fbps.m = 2,
LongiModel.method='lme',
mFPCA.pve = 0.95, mFPCA.knots = 35, mFPCA.p = 3, mFPCA.m = 2,
gam.method = 'REML', gam.kT = 10)
par(mfrow=c(2,2))
# fpca.sc vs. lme (assumes linearity)
matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,1])),
xlab="visit time", main="k=1", type='l', ylab="", col=rainbow(100, alpha = 1),
lwd=1, lty=1)
matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,2])),
xlab="visit time", main="k=2",type='l', ylab="", col=rainbow(100, alpha = 1),
lwd=1, lty=1)
matplot(TT, do.call(cbind,lapply(est.lme$sFPCA.xiHat.bySubj, function(a) a[,1])),
xlab="visit time", main="k=1", type='l', ylab="", col=rainbow(100, alpha = 1),
lwd=1, lty=1)
matplot(TT, do.call(cbind,lapply(est.lme$sFPCA.xiHat.bySubj, function(a) a[,2])),
xlab="visit time", main="k=2", type='l', ylab="", col=rainbow(100, alpha = 1),
lwd=1, lty=1)
## End(Not run)