R: Longitudinal Functional Data Analysis using FPCA

fpca.lfda {refund}

R Documentation

Longitudinal Functional Data Analysis using FPCA

Description

Implements longitudinal functional data analysis (Park and Staicu, 2015). It decomposes longitudinally-observed functional observations in two steps. It first applies FPCA on a properly defined marginal covariance function and obtain estimated scores (mFPCA step). Then it further models the underlying process dynamics by applying another FPCA on a covariance of the estimated scores obtained in the mFPCA step. The function also allows to use a random effects model to study the underlying process dynamics instead of a KL expansion model in the second step. Scores in mFPCA step are estimated using numerical integration. Scores in sFPCA step are estimated under a mixed model framework.

Usage

fpca.lfda(
  Y,
  subject.index,
  visit.index,
  obsT = NULL,
  funcArg = NULL,
  numTEvalPoints = 41,
  newdata = NULL,
  fbps.knots = c(5, 10),
  fbps.p = 3,
  fbps.m = 2,
  mFPCA.pve = 0.95,
  mFPCA.knots = 35,
  mFPCA.p = 3,
  mFPCA.m = 2,
  mFPCA.npc = NULL,
  LongiModel.method = c("fpca.sc", "lme"),
  sFPCA.pve = 0.95,
  sFPCA.nbasis = 10,
  sFPCA.npc = NULL,
  gam.method = "REML",
  gam.kT = 10
)

Arguments

`Y`	a matrix of which each row corresponds to one curve observed on a regular and dense grid (dimension of N by m; N = total number of observed functions; m = number of grid points)
`subject.index`	subject id; vector of length N with each element corresponding a row of Y
`visit.index`	index for visits (repeated measures); vector of length N with each element corresponding a row of Y
`obsT`	actual time of visits at which a function is observed; vector of length N with each element corresponding a row of Y
`funcArg`	numeric; function argument
`numTEvalPoints`	total number of evaluation time points for visits; used for pre-binning in sFPCA step; defaults to 41
`newdata`	an optional data frame providing predictors (i for subject id / Ltime for visit time) with which prediction is desired; defaults to NULL
`fbps.knots`	list of two vectors of knots or number of equidistanct knots for all dimensions for a fast bivariate P-spline smoothing (fbps) method used to estimate a bivariate, smooth mean function; defaults to c(5,10); see `fbps`
`fbps.p`	integer;degrees of B-spline functions to use for a fbps method; defaults to 3; see `fbps`
`fbps.m`	integer;order of differencing penalty to use for a fbps method; defaults to 2; see `fbps`
`mFPCA.pve`	proportion of variance explained for a mFPCA step; used to choose the number of principal components (PCs); defaults to 0.95; see `fpca.face`
`mFPCA.knots`	number of knots to use or the vectors of knots in a mFPCA step; used for obtain a smooth estimate of a covariance function; defaults to 35; see `fpca.face`
`mFPCA.p`	integer; the degree of B-spline functions to use in a mFPCA step; defaults to 3; see `fpca.face`
`mFPCA.m`	integer;order of differencing penalty to use in a mFPCA step; defaults to 2; see `fpca.face`
`mFPCA.npc`	pre-specified value for the number of principal components; if given, it overrides `pve`; defaults to NULL; see `fpca.face`
`LongiModel.method`	model and estimation method for estimating covariance of estimated scores from a mFPCA step; either KL expansion model or random effects model; defaults to fpca.sc
`sFPCA.pve`	proportion of variance explained for sFPCA step; used to choose the number of principal components; defaults to 0.95; see `fpca.sc`
`sFPCA.nbasis`	number of B-spline basis functions used in sFPCA step for estimation of the mean function and bivariate smoothing of the covariance surface; defaults to 10; see `fpca.sc`
`sFPCA.npc`	pre-specified value for the number of principal components; if given, it overrides `pve`; defaults to NULL; see `fpca.sc`
`gam.method`	smoothing parameter estimation method when `gam` is used for predicting score functions at unobserved visit time, T; defaults to `REML`; see `gam`
`gam.kT`	dimension of basis functions to use; see `gam`

Value

A list with components

`obsData`	observed data (input)
`i`	subject id
`funcArg`	function argument
`visitTime`	visit times
`fitted.values`	fitted values (in-sample); of the same dimension as Y
`fitted.values.all`	a list of which each component consists of a subject's fitted values at all pairs of evaluation points (s and T)
`predicted.values`	predicted values for variables provided in newdata
`bivariateSmoothMeanFunc`	estimated bivariate smooth mean function
`mFPCA.efunctions`	estimated eigenfunction in a mFPCA step
`mFPCA.evalues`	estimated eigenvalues in a mFPCA step
`mFPCA.npc`	number of principal components selected with pre-specified pve in a mFPCA step
`mFPCA.scree.eval`	estimated eigenvalues obtained with pre-specified pve = 0.9999; for scree plot
`sFPCA.xiHat.bySubj`	a list of which each component consists of a subject's predicted score functions evaluated at equidistanced grid in direction of visit time, T
`sFPCA.npc`	a vector of numbers of principal components selected in a sFPCA step with pre-specified pve; length of mFPCA.npc
`mFPCA.covar`	estimated marginal covariance
`sFPCA.longDynCov.k`	a list of estimated covariance of score function; length of mFPCA.npc

Details

A random effects model is recommended when a set of visit times for all subjects and visits is not dense in its range.

Author(s)

So Young Park spark13@ncsu.edu, Ana-Maria Staicu

References

Park, S.Y. and Staicu, A.M. (2015). Longitudinal functional data analysis. Stat 4 212-226.

Examples

  ## Not run:  
  ########################################
  ###   Illustration with real data    ###
  ########################################   

  data(DTI)
  MS <- subset(DTI, case ==1)  # subset data with multiple sclerosis (MS) case

  index.na <- which(is.na(MS$cca))  
  Y <- MS$cca; Y[index.na] <- fpca.sc(Y)$Yhat[index.na]; sum(is.na(Y))
  id <- MS$ID 
  visit.index <- MS$visit 
  visit.time <- MS$visit.time/max(MS$visit.time)

  lfpca.dti <- fpca.lfda(Y = Y, subject.index = id,  
                         visit.index = visit.index, obsT = visit.time, 
                         LongiModel.method = 'lme',
                         mFPCA.pve = 0.95)
                         
  TT <- seq(0,1,length.out=41); ss = seq(0,1,length.out=93)
  
  # estimated mean function
  persp(x = ss, y = TT, z = t(lfpca.dti$bivariateSmoothMeanFunc),
        xlab="s", ylab="visit times", zlab="estimated mean fn", col='light blue')
        
  # first three estimated marginal eigenfunctions
  matplot(ss, lfpca.dti$mFPCA.efunctions[,1:3], type='l', xlab='s', ylab='estimated eigen fn')
  
  # predicted scores function corresponding to first two marginal PCs
  matplot(TT, do.call(cbind, lapply(lfpca.dti$sFPCA.xiHat.bySubj, function(a) a[,1])),
          xlab="visit time (T)", ylab="xi_hat(T)", main = "k = 1", type='l')
  matplot(TT, do.call(cbind, lapply(lfpca.dti$sFPCA.xiHat.bySubj, function(a) a[,2])),
          xlab="visit time (T)", ylab="xi_hat(T)", main = "k = 2", type='l')

  # prediction of cca of first two subjects at T = 0, 0.5 and 1 (black, red, green)
  matplot(ss, t(lfpca.dti$fitted.values.all[[1]][c(1,21,41),]), 
         type='l', lty = 1, ylab="", xlab="s", main = "Subject = 1")    
  matplot(ss, t(lfpca.dti$fitted.values.all[[2]][c(1,21,41),]), 
         type='l', lty = 1, ylab="", xlab="s", main = "Subject = 2")    
     
  ########################################
  ### Illustration with simulated data ###
  ########################################   

  ###########################################################################################
  # data generation
  ###########################################################################################
  set.seed(1)
  n <- 100 # number of subjects
  ss <- seq(0,1,length.out=101) 
  TT <- seq(0, 1, length.out=41)
  mi <- runif(n, min=6, max=15)
  ij <- sapply(mi, function(a) sort(sample(1:41, size=a, replace=FALSE)))
  
  # error variances
  sigma <- 0.1 
  sigma_wn <- 0.2

  lambdaTrue <- c(1,0.5) # True eigenvalues
  eta1True <- c(0.5, 0.5^2, 0.5^3) # True eigenvalues
  eta2True <- c(0.5^2, 0.5^3) # True eigenvalues
  
  phi <- sqrt(2)*cbind(sin(2*pi*ss),cos(2*pi*ss))
  psi1 <- cbind(rep(1,length(TT)), sqrt(3)*(2*TT-1), sqrt(5)*(6*TT^2-6*TT+1))
  psi2 <- sqrt(2)*cbind(sin(2*pi*TT),cos(2*pi*TT))
  
  zeta1 <- sapply(eta1True, function(a) rnorm(n = n, mean = 0, sd = a))
  zeta2 <- sapply(eta2True, function(a) rnorm(n = n, mean = 0, sd = a))
  
  xi1 <- unlist(lapply(1:n, function(a) (zeta1 %*% t(psi1))[a,ij[[a]]] ))
  xi2 <- unlist(lapply(1:n, function(a) (zeta2 %*% t(psi2))[a,ij[[a]]] ))
  xi <- cbind(xi1, xi2)
  
  Tij <- unlist(lapply(1:n, function(i) TT[ij[[i]]] ))
  i <- unlist(lapply(1:n, function(i) rep(i, length(ij[[i]]))))
  j <- unlist(lapply(1:n, function(i) 1:length(ij[[i]])))
  
  X <- xi %*% t(phi)
  meanFn <- function(s,t){ 0.5*t + 1.5*s + 1.3*s*t}
  mu <- matrix(meanFn(s = rep(ss, each=length(Tij)), t=rep(Tij, length(ss)) ) , nrow=nrow(X))

  Y <- mu +  X + 
     matrix(rnorm(nrow(X)*ncol(phi), 0, sigma), nrow=nrow(X)) %*% t(phi) + #correlated error
     matrix(rnorm(length(X), 0, sigma_wn), nrow=nrow(X)) # white noise

  matplot(ss, t(Y[which(i==2),]), type='l', ylab="", xlab="functional argument", 
         main="observations from subject i = 2")
  # END: data generation
  
  ###########################################################################################
  # Illustration I : when covariance of scores from a mFPCA step is estimated using fpca.sc
  ###########################################################################################
  est <- fpca.lfda(Y = Y, 
                   subject.index = i, visit.index = j, obsT = Tij, 
                   funcArg = ss, numTEvalPoints = length(TT), 
                   newdata = data.frame(i = c(1:3), Ltime = c(Tij[1], 0.2, 0.5)), 
                   fbps.knots = 35, fbps.p = 3, fbps.m = 2,
                   LongiModel.method='fpca.sc',
                   mFPCA.pve = 0.95, mFPCA.knots = 35, mFPCA.p = 3, mFPCA.m = 2, 
                   sFPCA.pve = 0.95, sFPCA.nbasis = 10, sFPCA.npc = NULL,
                   gam.method = 'REML', gam.kT = 10)
  
  
  # mean function (true vs. estimated)
  par(mfrow=c(1,2))
  persp(x=TT, y = ss, z= t(sapply(TT, function(a) meanFn(s=ss, t = a))),
          xlab="visit times", ylab="s", zlab="true mean fn")
  persp(x = TT, y = ss, est$bivariateSmoothMeanFunc,
   xlab="visit times", ylab="s", zlab="estimated mean fn", col='light blue')
  
  ################   mFPCA step   ################
  par(mfrow=c(1,2))
  
  # marginal covariance fn (true vs. estimated)
  image(phi%*%diag(lambdaTrue)%*%t(phi))
  image(est$mFPCA.covar) 
  
  # eigenfunctions (true vs. estimated)
  matplot(ss, phi, type='l') 
  matlines(ss, cbind(est$mFPCA.efunctions[,1], est$mFPCA.efunctions[,2]), type='l', lwd=2)
  
  # scree plot
  plot(cumsum(est$mFPCA.scree.eval)/sum(est$mFPCA.scree.eval), type='l', 
       ylab = "Percentage of variance explained")
  points(cumsum(est$mFPCA.scree.eval)/sum(est$mFPCA.scree.eval), pch=16)
  
  ################   sFPCA step   ################
  par(mfrow=c(1,2))
  print(est$mFPCA.npc)  # k = 2
  
  # covariance of score functions for k = 1 (true vs. estimated)
  image(psi1%*%diag(eta1True)%*%t(psi1), main='TRUE')
  image(est$sFPCA.longDynCov.k[[1]], main='ESTIMATED')
  
  # covariance of score functions for k = 2 (true vs. estimated)
  image(psi2%*%diag(eta2True)%*%t(psi2))
  image(est$sFPCA.longDynCov.k[[2]])
  
  # estimated scores functions
  matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,1])), 
          xlab="visit time", main="k=1", type='l', ylab="", col=rainbow(100, alpha = 1), 
          lwd=1, lty=1)
  matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,2])), 
          xlab="visit time", main="k=2",type='l', ylab="", col=rainbow(100, alpha = 1), 
          lwd=1, lty=1)
  
  ################   In-sample and Out-of-sample Prediction   ################
  par(mfrow=c(1,2))
  # fitted
  matplot(ss, t(Y[which(i==1),]), type='l', ylab="", xlab="functional argument")
  matlines(ss, t(est$fitted.values[which(i==1),]), type='l', lwd=2)
  
 # sanity check : expect fitted and predicted (obtained using info from newdata) 
 #                values to be the same
 
  plot(ss, est$fitted.values[1,], type='p', xlab="", ylab="", pch = 1, cex=1)
  lines(ss, est$predicted.values[1,], type='l', lwd=2, col='blue')
  all.equal(est$predicted.values[1,], est$fitted.values[1,])
  
  ###########################################################################################
  # Illustration II : when covariance of scores from a mFPCA step is estimated using lmer
  ###########################################################################################
  est.lme <- fpca.lfda(Y = Y, 
                       subject.index = i, visit.index = j, obsT = Tij,
                       funcArg = ss, numTEvalPoints = length(TT), 
                       newdata = data.frame(i = c(1:3), Ltime = c(Tij[1], 0.2, 0.5)), 
                       fbps.knots = 35, fbps.p = 3, fbps.m = 2,
                       LongiModel.method='lme',
                       mFPCA.pve = 0.95, mFPCA.knots = 35, mFPCA.p = 3, mFPCA.m = 2, 
                       gam.method = 'REML', gam.kT = 10)
  
  par(mfrow=c(2,2))
  
  # fpca.sc vs. lme (assumes linearity)
  matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,1])), 
          xlab="visit time", main="k=1", type='l', ylab="", col=rainbow(100, alpha = 1), 
          lwd=1, lty=1)
  matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,2])), 
          xlab="visit time", main="k=2",type='l', ylab="", col=rainbow(100, alpha = 1), 
          lwd=1, lty=1)
          
  matplot(TT, do.call(cbind,lapply(est.lme$sFPCA.xiHat.bySubj, function(a) a[,1])), 
          xlab="visit time", main="k=1", type='l', ylab="", col=rainbow(100, alpha = 1), 
          lwd=1, lty=1)
  matplot(TT, do.call(cbind,lapply(est.lme$sFPCA.xiHat.bySubj, function(a) a[,2])), 
          xlab="visit time", main="k=2", type='l', ylab="", col=rainbow(100, alpha = 1),
          lwd=1, lty=1)
  
## End(Not run)

[Package refund version 0.1-35 Index]