spCopulaDDP {spBayesSurv}R Documentation

Marginal Bayesian Nonparametric Survival Model via Spatial Copula

Description

This function fits a marginal Bayesian Nonparametric model (Zhou, Hanson and Knapp, 2015) for point-referenced right censored time-to-event data. Note that the function arguments are slightly different with those presented in the original paper; see Zhou, Hanson and Zhang (2018) for new examples.

Usage

spCopulaDDP(formula, data, na.action, prediction=NULL,
            mcmc=list(nburn=3000, nsave=2000, nskip=0, ndisplay=500), 
            prior=NULL, state=NULL, scale.designX=TRUE,
            Coordinates, DIST=NULL, Knots=NULL)

Arguments

formula

a formula expression with the response returned by the Surv function in the survival package. It currently only supports right-censoring.

data

a data frame in which to interpret the variables named in the formula argument.

na.action

a missing-data filter function, applied to the model.frame.

prediction

a list giving the information used to obtain conditional inferences. The list includes the following elements: spred and xpred giving the n by 2 new locations and corresponding npred by p covariates matrix, respectively, used for prediction. If prediction=NULL, xpred will be set to be the design matrix used in formula, and spred will be set to be in Coordinates.

mcmc

a list giving the MCMC parameters. The list must include the following elements: nburn an integer giving the number of burn-in scans, nskip an integer giving the thinning interval, nsave an integer giving the total number of scans to be saved, ndisplay an integer giving the number of saved scans to be displayed on screen (the function reports on the screen when every ndisplay iterations have been carried out).

prior

a list giving the prior information. See Zhou, Hanson and Zhang (2018) for more detailed hyperprior specifications.

state

a list giving the current value of the parameters. This list is used if the current analysis is the continuation of a previous analysis.

scale.designX

flag to indicate wheter the design matrix X will be centered by column means and scaled by column standard deviations, where TRUE indicates yes. The default is TRUE for improving numerical stability. Even when it is scaled, the reported regression coefficients are in original scales. Note if we want to specify informative priors for regression coefficients, these priors should correspond to scaled predictors when scale.designX=TRUE.

Coordinates

an n by 2 coordinates matrix, where n is the sample size, 2 is the dimension of coordiantes. Note all cocordinates should be distinct.

DIST

This is a function argument, used to calculate the distance. The default is Euclidean distance (fields::rdist). This function should have two arguments (X1,X2), where X1 and X2 are matrices with coordinates as the rows. The returned value of this function should be the pairwise distance matrix. If nrow(X1)=m and nrow(X2)=n then the function should return an m by n matrix of all distances between these two sets of points.

Knots

an nknots by 2 matrix, where nknots is the number of selected knots for FSA, and 2 is the dimension of each location. If Knots is not specified, the space-filling algorithm will be used to find the knots.

Details

This function fits a marginal Bayesian Nonparametric model (Zhou, Hanson and Knapp, 2015) for point-referenced right censored time-to-event data. Note that the function arguments are slightly different with those presented in the original paper; see Zhou, Hanson and Zhang (2018) for new examples.

Value

The spCopulaDDP object is a list containing at least the following components:

n

the number of row observations used in fitting the model

p

the number of columns in the model matrix

Surv

the Surv object used

X.scaled

the n by p scaled design matrix

X

the n by p orginal design matrix

beta

the p+1 by N by nsave array of posterior samples for the coefficients

sigma2

the N by nsave matrix of posterior samples for sigma2 involved in the DDP.

w

the N by nsave matrix of posterior samples for weights involved in the DDP.

theta

the 2 by nsave matrix of posterior samples for partial sill and range involved in the Gaussian copula.

Tpred

the npred by nsave predicted survival times for covariates specified in the argument prediction.

Zpred

the npred by nsave predicted z values for covariates specified in the argument prediction.

ratey

the n-vector of acceptance rates for sampling censored survival times.

ratebeta

the N-vector of acceptance rates for sampling beta coefficients.

ratesigma

the N-vector of acceptance rates for sampling sigma2.

ratetheta

the acceptance rate for sampling theta.

Author(s)

Haiming Zhou and Timothy Hanson

References

Zhou, H., Hanson, T., and Zhang, J. (2020). spBayesSurv: Fitting Bayesian Spatial Survival Models Using R. Journal of Statistical Software, 92(9): 1-33.

Zhou, H., Hanson, T., and Knapp, R. (2015). Marginal Bayesian nonparametric model for time to disease arrival of threatened amphibian populations. Biometrics, 71(4): 1101-1110.

See Also

anovaDDP, GetCurves

Examples

###############################################################
# A simulated data: mixture of two normals with spatial dependence
###############################################################
rm(list=ls())
library(survival)
library(spBayesSurv)
library(coda)
## True parameters 
betaT = cbind(c(3.5, 0.5), c(2.5, -1)); 
wT = c(0.4, 0.6); 
sig2T = c(1^2, 0.5^2);
theta1 = 0.98; theta2 = 0.1;
n=30; npred=3; ntot = n+npred;
## The Survival function for log survival times:
fiofy = function(y, xi, w=wT){
  nw = length(w);
  ny = length(y);
  res = matrix(0, ny, nw);
  Xi = c(1,xi);
  for (k in 1:nw){
    res[,k] = w[k]*dnorm(y, sum(Xi*betaT[,k]), sqrt(sig2T[k]) )
  }
  apply(res, 1, sum)
}
fioft = function(t, xi, w=wT) fiofy(log(t), xi, w)/t;
Fiofy = function(y, xi, w=wT){
  nw = length(w);
  ny = length(y);
  res = matrix(0, ny, nw);
  Xi = c(1,xi);
  for (k in 1:nw){
    res[,k] = w[k]*pnorm(y, sum(Xi*betaT[,k]), sqrt(sig2T[k]) )
  }
  apply(res, 1, sum)
}
Fioft = function(t, xi, w=wT) Fiofy(log(t), xi, w);
## The inverse for Fioft
Finv = function(u, x) uniroot(function (y) Fiofy(y,x)-u, lower=-250, 
                              upper=250, extendInt ="yes", tol=1e-6)$root

## generate coordinates: 
## npred is the # of locations for prediction
ldist = 100; wdist = 40;
s1 = runif(ntot, 0, wdist); s2 = runif(ntot, 0, ldist);
s = cbind(s1,s2); #plot(s[,1], s[,2]);
## Covariance matrix
corT = matrix(1, ntot, ntot);
for (i in 1:(ntot-1)){
  for (j in (i+1):ntot){
    dij = sqrt(sum( (s[i,]-s[j,])^2 ));
    corT[i,j] = theta1*exp(-theta2*dij);
    corT[j,i] = theta1*exp(-theta2*dij);
  }
}

## generate x 
x1 = runif(ntot,-1.5,1.5); X = cbind(x1);
## generate transformed log of survival times
z = MASS::mvrnorm(1, rep(0, ntot), corT);
## generate survival times
u = pnorm(z);
tT = rep(0, ntot);
for (i in 1:ntot){
  tT[i] = exp(Finv(u[i], X[i,]));
}

### ----------- right-censored -------------###
t_obs=tT 
Centime = runif(ntot, 200, 500);
delta = (tT<=Centime) +0 ; 
length(which(delta==0))/ntot; # censoring rate
rcen = which(delta==0);
t_obs[rcen] = Centime[rcen]; ## observed time 
## make a data frame
dtot = data.frame(tobs=t_obs, x1=x1, delta=delta, tT=tT,
                  s1=s1, s2=s2); 
## Hold out npred for prediction purpose
predindex = sample(1:ntot, npred);
dpred = dtot[predindex,];
d = dtot[-predindex,];
# Prediction settings 
prediction = list(xpred=cbind(dpred$x1), 
                  spred=cbind(dpred$s1, dpred$s2));

###############################################################
# Independent DDP: Bayesian Nonparametric Survival Model
###############################################################
# MCMC parameters
nburn=100; nsave=100; nskip=0;
# Note larger nburn, nsave and nskip should be used in practice.
mcmc=list(nburn=nburn, nsave=nsave, nskip=nskip, ndisplay=1000);
prior = list(N=10, a0=2, b0=2, nknots=n, nblock=round(n/2));
# here nknots=n, so FSA is not used.
# If nknots<n, FSA will be used with nblock=round(n/2).
# As nknots is getting larger, the FSA is more accurate but slower
# As nblock is getting smaller, the FSA is more accurate but slower. 
# In most applications, setting nblock=n works fine, which is the
# setting by not specifying nblock. 
# If nknots is not specified or nknots=n, the exact covariance is used. 
# Fit the Cox PH model
res1 = spCopulaDDP(formula = Surv(tobs, delta)~x1, data=d, 
                     prior=prior, mcmc=mcmc, prediction=prediction,
                     Coordinates=cbind(d$s1,d$s2), Knots=NULL);
# here if prediction=NULL, prediction$xpred will be set as the design matrix
# in formula, and prediction$spred will be set as the Coordinates argument. 
# Knots=NULL is the defualt setting, for which the knots will be generated 
# using fields::cover.design() with number of knots equal to prior$nknots. 
## LPML
LPML = sum(log(res1$cpo)); LPML;
## Number of non-negligible components
quantile(colSums(res1$w>0.05))
## MSPE
mean((log(dpred$tT)-apply(log(res1$Tpred), 1, median))^2); 

## traceplot
par(mfrow = c(1,2))
traceplot(mcmc(res1$theta[1,]), main="sill")
traceplot(mcmc(res1$theta[2,]), main="range")

############################################
## Curves
############################################
ygrid = seq(0,6.0,length=100); tgrid = exp(ygrid);
ngrid = length(tgrid);
xpred = data.frame(x1=c(-1, 1)); 
plot(res1, xnewdata=xpred, tgrid=tgrid);

[Package spBayesSurv version 1.1.8 Index]