smcfcs.dtsam {smcfcs} | R Documentation |
Substantive model compatible fully conditional specification imputation of covariates for discrete time survival analysis
Description
Multiply imputes missing covariate values using substantive model compatible fully conditional specification for discrete time survival analysis.
Usage
smcfcs.dtsam(
originaldata,
smformula,
timeEffects = "factor",
method,
predictorMatrix = NULL,
m = 5,
numit = 10,
rjlimit = 1000,
noisy = FALSE,
errorProneMatrix = NULL
)
Arguments
originaldata |
The data in wide form (i.e. one row per subject) |
smformula |
A formula of the form "Surv(t,d)~x1+x2+x3", where t is the discrete time variable, d is the binary event indicator, and the covariates should not include time. The time variable should be an integer coded numeric variable taking values from 1 up to the final time period. |
timeEffects |
Specifies how the effect of time is modelled. |
method |
A required vector of strings specifying for each variable either
that it does not need to be imputed (""), the type of regression model to be
be used to impute. Possible values are |
predictorMatrix |
An optional predictor matrix. If specified, the matrix defines which covariates will be used as predictors in the imputation models (the outcome must not be included). The i'th row of the matrix should consist of 0s and 1s, with a 1 in the j'th column indicating the j'th variable be used as a covariate when imputing the i'th variable. If not specified, when imputing a given variable, the imputation model covariates are the other covariates of the substantive model which are partially observed (but which are not passively imputed) and any fully observed covariates (if present) in the substantive model. Note that the outcome variable is implicitly conditioned on by the rejection sampling scheme used by smcfcs, and should not be specified as a predictor in the predictor matrix. |
m |
The number of imputed datasets to generate. The default is 5. |
numit |
The number of iterations to run when generating each imputation. In a (limited) range of simulations good performance was obtained with the default of 10 iterations. However, particularly when the proportion of missingness is large, more iterations may be required for convergence to stationarity. |
rjlimit |
Specifies the maximum number of attempts which should be made
when using rejection sampling to draw from imputation models. If the limit is reached
when running a warning will be issued. In this case it is probably advisable to
increase the |
noisy |
logical value (default FALSE) indicating whether output should be noisy, which can be useful for debugging or checking that models being used are as desired. |
errorProneMatrix |
An optional matrix which if specified indicates that some variables
are measured with classical measurement error. If the i'th variable is measured with error
by variables j and k, then the (i,j) and (i,k) entries of this matrix should be 1, with the
remainder of entries 0. The i'th element of the method argument should then be specified
as |
Details
For this substantive model type, like for the other substantive model types, smcfcs
expects the originaldata
to have
one row per subject. Variables indicating the discrete time of failure/censoring
and the event indicator should be passed in smformula
, as described.
The default is to model the effect of time as a factor. This will not work in datasets where there is not at least one observed event in each time period. In such cases you must specify a simpler parametric model for the effect of time. At the moment you can specify either a linear or quadratic effect of time (on the log odds scale).
Author(s)
Jonathan Bartlett j.w.bartlett@bath.ac.uk
Examples
#the following example is not run when the package is compiled on CRAN
#(to keep computation time down), but it can be run by package users
## Not run:
#discrete time survival analysis example
M <- 5
imps <- smcfcs.dtsam(ex_dtsam, "Surv(failtime,d)~x1+x2",
method=c("logreg","", "", ""),m=M)
#fit dtsam model to each dataset manually, since we need
#to expand to person-period data form first
ests <- vector(mode = "list", length = M)
vars <- vector(mode = "list", length = M)
for (i in 1:M) {
longData <- survSplit(Surv(failtime,d)~x1+x2, data=imps$impDatasets[[i]],
cut=unique(ex_dtsam$failtime[ex_dtsam$d==1]))
mod <- glm(d~-1+factor(tstart)+x1+x2, family="binomial", data=longData)
ests[[i]] <- coef(mod)
vars[[i]] <- diag(vcov(mod))
}
library(mitools)
summary(MIcombine(ests,vars))
## End(Not run)