R: Marginal Bayesian Nonparametric Survival Model via Spatial...

spCopulaDDP {spBayesSurv}

R Documentation

Marginal Bayesian Nonparametric Survival Model via Spatial Copula

Description

This function fits a marginal Bayesian Nonparametric model (Zhou, Hanson and Knapp, 2015) for point-referenced right censored time-to-event data. Note that the function arguments are slightly different with those presented in the original paper; see Zhou, Hanson and Zhang (2018) for new examples.

Usage

spCopulaDDP(formula, data, na.action, prediction=NULL,
            mcmc=list(nburn=3000, nsave=2000, nskip=0, ndisplay=500), 
            prior=NULL, state=NULL, scale.designX=TRUE,
            Coordinates, DIST=NULL, Knots=NULL)

Arguments

`formula`	a formula expression with the response returned by the `Surv` function in the `survival` package. It currently only supports right-censoring.
`data`	a data frame in which to interpret the variables named in the `formula` argument.
`na.action`	a missing-data filter function, applied to the `model.frame`.
`prediction`	a list giving the information used to obtain conditional inferences. The list includes the following elements: `spred` and `xpred` giving the n by 2 new locations and corresponding npred by p covariates matrix, respectively, used for prediction. If `prediction=NULL`, `xpred` will be set to be the design matrix used in `formula`, and `spred` will be set to be in `Coordinates`.
`mcmc`	a list giving the MCMC parameters. The list must include the following elements: `nburn` an integer giving the number of burn-in scans, `nskip` an integer giving the thinning interval, `nsave` an integer giving the total number of scans to be saved, `ndisplay` an integer giving the number of saved scans to be displayed on screen (the function reports on the screen when every `ndisplay` iterations have been carried out).
`prior`	a list giving the prior information. See Zhou, Hanson and Zhang (2018) for more detailed hyperprior specifications.
`state`	a list giving the current value of the parameters. This list is used if the current analysis is the continuation of a previous analysis.
`scale.designX`	flag to indicate wheter the design matrix X will be centered by column means and scaled by column standard deviations, where `TRUE` indicates yes. The default is `TRUE` for improving numerical stability. Even when it is scaled, the reported regression coefficients are in original scales. Note if we want to specify informative priors for regression coefficients, these priors should correspond to scaled predictors when `scale.designX=TRUE`.
`Coordinates`	an n by 2 coordinates matrix, where n is the sample size, 2 is the dimension of coordiantes. Note all cocordinates should be distinct.
`DIST`	This is a function argument, used to calculate the distance. The default is Euclidean distance (`fields::rdist`). This function should have two arguments (X1,X2), where X1 and X2 are matrices with coordinates as the rows. The returned value of this function should be the pairwise distance matrix. If nrow(X1)=m and nrow(X2)=n then the function should return an m by n matrix of all distances between these two sets of points.
`Knots`	an nknots by 2 matrix, where nknots is the number of selected knots for FSA, and 2 is the dimension of each location. If `Knots` is not specified, the space-filling algorithm will be used to find the knots.

Details

Value

The spCopulaDDP object is a list containing at least the following components:

`n`	the number of row observations used in fitting the model
`p`	the number of columns in the model matrix
`Surv`	the `Surv` object used
`X.scaled`	the n by p scaled design matrix
`X`	the n by p orginal design matrix
`beta`	the p+1 by N by nsave array of posterior samples for the coefficients
`sigma2`	the N by nsave matrix of posterior samples for sigma2 involved in the DDP.
`w`	the N by nsave matrix of posterior samples for weights involved in the DDP.
`theta`	the 2 by nsave matrix of posterior samples for partial sill and range involved in the Gaussian copula.
`Tpred`	the npred by nsave predicted survival times for covariates specified in the argument `prediction`.
`Zpred`	the npred by nsave predicted z values for covariates specified in the argument `prediction`.
`ratey`	the n-vector of acceptance rates for sampling censored survival times.
`ratebeta`	the N-vector of acceptance rates for sampling beta coefficients.
`ratesigma`	the N-vector of acceptance rates for sampling sigma2.
`ratetheta`	the acceptance rate for sampling theta.

Author(s)

Haiming Zhou and Timothy Hanson

References

Zhou, H., Hanson, T., and Zhang, J. (2020). spBayesSurv: Fitting Bayesian Spatial Survival Models Using R. Journal of Statistical Software, 92(9): 1-33.

Zhou, H., Hanson, T., and Knapp, R. (2015). Marginal Bayesian nonparametric model for time to disease arrival of threatened amphibian populations. Biometrics, 71(4): 1101-1110.

Examples

###############################################################
# A simulated data: mixture of two normals with spatial dependence
###############################################################
rm(list=ls())
library(survival)
library(spBayesSurv)
library(coda)
## True parameters 
betaT = cbind(c(3.5, 0.5), c(2.5, -1)); 
wT = c(0.4, 0.6); 
sig2T = c(1^2, 0.5^2);
theta1 = 0.98; theta2 = 0.1;
n=30; npred=3; ntot = n+npred;
## The Survival function for log survival times:
fiofy = function(y, xi, w=wT){
  nw = length(w);
  ny = length(y);
  res = matrix(0, ny, nw);
  Xi = c(1,xi);
  for (k in 1:nw){
    res[,k] = w[k]*dnorm(y, sum(Xi*betaT[,k]), sqrt(sig2T[k]) )
  }
  apply(res, 1, sum)
}
fioft = function(t, xi, w=wT) fiofy(log(t), xi, w)/t;
Fiofy = function(y, xi, w=wT){
  nw = length(w);
  ny = length(y);
  res = matrix(0, ny, nw);
  Xi = c(1,xi);
  for (k in 1:nw){
    res[,k] = w[k]*pnorm(y, sum(Xi*betaT[,k]), sqrt(sig2T[k]) )
  }
  apply(res, 1, sum)
}
Fioft = function(t, xi, w=wT) Fiofy(log(t), xi, w);
## The inverse for Fioft
Finv = function(u, x) uniroot(function (y) Fiofy(y,x)-u, lower=-250, 
                              upper=250, extendInt ="yes", tol=1e-6)$root

## generate coordinates: 
## npred is the # of locations for prediction
ldist = 100; wdist = 40;
s1 = runif(ntot, 0, wdist); s2 = runif(ntot, 0, ldist);
s = cbind(s1,s2); #plot(s[,1], s[,2]);
## Covariance matrix
corT = matrix(1, ntot, ntot);
for (i in 1:(ntot-1)){
  for (j in (i+1):ntot){
    dij = sqrt(sum( (s[i,]-s[j,])^2 ));
    corT[i,j] = theta1*exp(-theta2*dij);
    corT[j,i] = theta1*exp(-theta2*dij);
  }
}

## generate x 
x1 = runif(ntot,-1.5,1.5); X = cbind(x1);
## generate transformed log of survival times
z = MASS::mvrnorm(1, rep(0, ntot), corT);
## generate survival times
u = pnorm(z);
tT = rep(0, ntot);
for (i in 1:ntot){
  tT[i] = exp(Finv(u[i], X[i,]));
}

### ----------- right-censored -------------###
t_obs=tT 
Centime = runif(ntot, 200, 500);
delta = (tT<=Centime) +0 ; 
length(which(delta==0))/ntot; # censoring rate
rcen = which(delta==0);
t_obs[rcen] = Centime[rcen]; ## observed time 
## make a data frame
dtot = data.frame(tobs=t_obs, x1=x1, delta=delta, tT=tT,
                  s1=s1, s2=s2); 
## Hold out npred for prediction purpose
predindex = sample(1:ntot, npred);
dpred = dtot[predindex,];
d = dtot[-predindex,];
# Prediction settings 
prediction = list(xpred=cbind(dpred$x1), 
                  spred=cbind(dpred$s1, dpred$s2));

###############################################################
# Independent DDP: Bayesian Nonparametric Survival Model
###############################################################
# MCMC parameters
nburn=100; nsave=100; nskip=0;
# Note larger nburn, nsave and nskip should be used in practice.
mcmc=list(nburn=nburn, nsave=nsave, nskip=nskip, ndisplay=1000);
prior = list(N=10, a0=2, b0=2, nknots=n, nblock=round(n/2));
# here nknots=n, so FSA is not used.
# If nknots<n, FSA will be used with nblock=round(n/2).
# As nknots is getting larger, the FSA is more accurate but slower
# As nblock is getting smaller, the FSA is more accurate but slower. 
# In most applications, setting nblock=n works fine, which is the
# setting by not specifying nblock. 
# If nknots is not specified or nknots=n, the exact covariance is used. 
# Fit the Cox PH model
res1 = spCopulaDDP(formula = Surv(tobs, delta)~x1, data=d, 
                     prior=prior, mcmc=mcmc, prediction=prediction,
                     Coordinates=cbind(d$s1,d$s2), Knots=NULL);
# here if prediction=NULL, prediction$xpred will be set as the design matrix
# in formula, and prediction$spred will be set as the Coordinates argument. 
# Knots=NULL is the defualt setting, for which the knots will be generated 
# using fields::cover.design() with number of knots equal to prior$nknots. 
## LPML
LPML = sum(log(res1$cpo)); LPML;
## Number of non-negligible components
quantile(colSums(res1$w>0.05))
## MSPE
mean((log(dpred$tT)-apply(log(res1$Tpred), 1, median))^2); 

## traceplot
par(mfrow = c(1,2))
traceplot(mcmc(res1$theta[1,]), main="sill")
traceplot(mcmc(res1$theta[2,]), main="range")

############################################
## Curves
############################################
ygrid = seq(0,6.0,length=100); tgrid = exp(ygrid);
ngrid = length(tgrid);
xpred = data.frame(x1=c(-1, 1)); 
plot(res1, xnewdata=xpred, tgrid=tgrid);

[Package spBayesSurv version 1.1.8 Index]