curve_ppmx {ppmSuite}R Documentation

Gaussian PPMx Model for Functional Realizations

Description

curve_ppmx is the main function used to fit Functional Gaussian PPMx model.

Usage

curve_ppmx(y, z, subject,
            Xcon=NULL,Xcat=NULL,
            Xconp=NULL,Xcatp=NULL,
            PPM, M,
            q=3, rw_order=1, balanced=1,
            nknots,npredobs,
            Aparm, modelPriors,
            similarity_function=1,
            consim, calibrate,
            simParms,
            mh=c(1,1),
            draws=1100,burn=100,thin=1)

Arguments

y

numeric vector or a matrix with two columns that contains measured functional response in long format

z

numeric vector contains time points at which functional response is measured in long format

subject

vector of the same length as z that identies the subject to which each measurement in y corresponds.

Xcon

a data-frame with number of rows being equal to the number of subjects and whose columns consist of continuous covariates. These covariates are included in the PPMx model and therefore influence clusters and thus are only used if the PPM argument is FALSE. This argument is set to NULL by default

Xcat

a data-frame with nsubject number of rows and whose columns consist of categorical covariates. These covariates are included in the PPMx model and therefore influence clusters and thus are only used if the PPM argument is FALSE. The categories must be labeled using integers starting at zero. This argument is set to NULL by default

Xconp

a data-frame with the number of rows corresponding to the number of out-of-sample predictions that are desired and columns consist of continuous covariates that are contained in Xcon.

Xcatp

a data-frame with the number of rows corresponding to the number of out-of-sample predictions that are desired and columns consist of categorical covariates that are contained in Xcat.

PPM

Logical argument that indicates if the PPM or PPMx partition model should be employed. If PPM = FALSE, then at least one of Xcon and Xcat must be supplied.

M

Scale parameter connected to the dispersion parameter of a Dirichlet process. Default is 1.

q

Degree of B-spline employed to fit curves

rw_order

Order of the random walk. This specifies the type of penalty matrix employed in the penalized B-splines.

balanced

scalar with 1 - indicating the design was balanced in the sense that all subjects measurements occur at the same time and 0 - indicating design was not balanced.

nknots

scalar indicating the number of evenly spaced knots to be used.

npredobs

number of time predictions to make for each subjects curve.

Aparm

Upper bound parameter for lambda wich regulates the similarity of curves with in a cluster. Larger values result in clusters with curves that can be more dissimilar.

modelPriors

Vector of prior parameter values for priors assigned to parameters of the Gaussian Functional data model.

  • A - upper bound on sigma*_j.

  • s2mu - prior variance for mu the mean vector of theta which are cluster specific spline coefficients,

  • mb0 - prior mean for beta0_i (subject specific intercept)

  • s2b0 - prior variance for beta0_i (subject specific intercept)

  • as2b0 - prior shape associated with IG prior on variance of beta0_i (subject specific intercept)

  • bs2b0 - prior scale associated with IG prior on variacne of beta0_i (subject specific intercept)

  • at - prior shape associated with IG prior on tau (smoothing parameter for theta)

  • bt - prior scale associated with IG prior on tau (smoothing parameter for theta)

similarity_function

Type of similarity function that is employed for the PPMx prior on partitions. Options are 1-4 with

  • 1 - Auxilliary similarity

  • 2 - Double dipper similarity

  • 3 - Cluster variance or entropy for categorical covariates

  • 4 - Mean Gower dissimilarity (Gower dissimilarity is not available if missing values are present in X)

consim

If similarity_function is set to either 1 or 2, then consim specifies the type of marginal likelihood used as the similarity function. Options are 1 or 2 with

  • 1 - N-N(m0, s20, v) (v variance of “likelihood”, m0 and s20 “prior” parameters),

  • 2 - N-NIG(m0, k0, nu0, s20) (m0 and k0 center and inverse scalar of a Gaussian, and nu0 and s20 are the number of prior observations and prior variance guess of a Inverse-Chi-Square distribution.)

calibrate

Indicates if the similarity should be calibrated. Options are 0-2 with

  • 0 - no calibration

  • 1 - standardize similarity value for each covariate

  • 2 - coarsening is applied so that each similarity is raised to the 1/p power

simParms

Vector of parameter values employed in the similarity function of the PPMx. Entries of the vector correspond to

  • m0 - center continuous similarity with default 0,

  • s20 - spread of continuous similarity with default 1 if consim=1. For consim=2 guess of x's variance,

  • v2 - spread of 'likelihood' for conitnuous similarity (smaller values place more weight on partitions with clusters that contain homogeneous covariate values)

  • k0 - inverse scale for v (only used for N-NIG similarity model)

  • nu0 - prior number of x "observations" (only used for N-NIG similarity model)

  • a0 - dirichlet weight for categorical similarity with default of 0.1 (smaller values place more weight on partitions with individuals that are in the same category.)

  • alpha - weight associated with cluster-variance and Gower disimilarity

mh

two dimensional vector containing values for tunning parameter associated with MH update for sigma2 and sigma20

draws

number of MCMC iterates to be collected. default is 1100

burn

number of MCMC iterates discared as burn-in. default is 100

thin

number by which the MCMC chain is thinne. default is 1. Thin must be selected so that it is a multilple of (draws - thin)

Details

This function fits a hierarhical functional data model where B-spline coefficients are clustered using either a PPM or a PPMx prior on partitions.

Value

The function returns a list containing arrays filled with MCMC iterates corresponding to model parameters and model fit metrics. In order to provide more detail, in what follows let

"T" - be the number of MCMC iterates collected,

"N" - be the number of subjects/units,

"P" - be the number of knots + degree of spline.

The output list contains the following

Examples




# Example with balanced data.
# generate data for two clusters with 10 subjects each.

nobs <- 100
nsubject <- 2*10

set.seed(101)
xx <- seq(0,2*pi, length=nobs)
y <- cbind(replicate(n=10, sin(xx) + rnorm(nobs,0,0.5)),
           replicate(n=10, cos(xx) + rnorm(nobs,0,0.5)))

dat <- data.frame(y=c(y),
                  z=rep(1:nobs, times=nsubject),
                  Name=rep(1:nsubject, each=nobs))

subject_obs_vec <- dat$Name

nknots <- 15



# Small number of iterates for illustrative purposes only
niter <- 5000
nburn <- 2000
nthin <- 3
nout <- (niter-nburn)/nthin

z <- dat$z

## the order here is c(mu0, s20, v, k0, nu0, a0, alpha)
## If simularity is N-NIG then k0 and nu0 are used but v is not
## If simularity is N-N then v is used but no k0 and nu0
simparms <- c(0.0, 1.0, 0.1, 1.0, 1.0, 0.1, 1)

fits <- list()

# fit vgrf only
y <- dat$y

modelPriors <- c(0.5,      # Asig
                 1000^2,   # s2_mu
                 0,        # mb0
                 1000^2,   # s2b0
                 1,        # as2b0
                 1,        # bs2b0
                 1,        # at
                 1.0/0.05) # bt


fit <- curve_ppmx(y=cbind(y), z=z,
             subject=subject_obs_vec,
             Xcon = NULL, Xcat = NULL,
             Xconp=NULL, Xcatp=NULL,
             PPM=TRUE, M=1,
             q=3, rw_order=1, balanced=1,
             nknots=nknots,  npredobs=1,
             Aparm=100,
             modelPriors=modelPriors,
             similarity_function=1,
             consim=1, calibrate=0,
             simParms=simparms,
             mh=c(0.1, 1e-4),
             draws=niter,
             burn=nburn,
             thin=nthin)


Hmat <- fit$Hmat

# For a point estimate of partition, take first MCMC interate
# This is done only for illustrative purposes.  Recommend using
# the salso R package.

p.est <- fit$Si[1,]

nc <- length(unique(p.est))


oldpar <- par(no.readonly = TRUE)


# Plot individual subject fits.

tmp <- c(1,6,11,16)
par(mfrow=c(2,2))
for(j in tmp){
  bmn <- apply(fit$beta[j,,],1,mean)
  b0mn <- mean(fit$beta0[,j])

  ytmp <- y[dat$Name==j]

  b0vec <- rep(b0mn, nobs)

  plot(1:nobs,c(ytmp),
		  type='n',ylab="Response",
		  xlab="Time")

  points(1:nobs,ytmp)
  lines(1:nobs,  b0vec+Hmat%*%bmn, col=p.est[j],lwd=2)

}

# plot all curves in one plot
par(mfrow=c(1,1))

plot(dat$z, dat$y, type="n",ylab="",xlab="Time")

for(j in 1:nsubject){

  bmn <- apply(fit$beta[j,,],1,mean)
  b0mn <- mean(fit$beta0[,j])

  b0vec <- rep(b0mn, nobs)

  lines((1:nobs),  b0vec+Hmat%*%bmn, col=p.est[j],lwd=0.5)

}


par(oldpar)







[Package ppmSuite version 0.3.4 Index]