dataLongMultiSpell {discSurv} | R Documentation |
Data long transformation for multi spell analysis
Description
Transform data from short format into long format for discrete multi spell survival analysis and right censoring.
Usage
dataLongMultiSpell(
dataSemiLong,
timeColumn,
eventColumn,
idColumn,
timeAsFactor = FALSE,
spellAsFactor = FALSE
)
Arguments
dataSemiLong |
Original data in semi-long format ("class data.frame"). |
timeColumn |
Character giving the column name of the observed times. It is required that the observed times are discrete ("character vector"). |
eventColumn |
Column name of the event status ("character vector"). The events can take multiple values on a discrete scale (0, 1, 2, ...) and repetition of events is allowed (integer vector or class factor). It is assumed that the number zero corresponds to censoring and all number > 0 represent the observed states between transitions. |
idColumn |
Name of column of identification number of persons as character("character vector"). |
timeAsFactor |
Should the time intervals be coded as factor ("logical vector")? Default is FALSE. In the default settings the discrete time intervals are treated as quantitative ("numeric vector"). |
spellAsFactor |
Should the spells be coded as factor ("logical vector")? Default is not to use factor. If the argument is false, the column is coded as numeric. |
Details
If the data has continuous survival times, the response may be transformed
to discrete intervals using function contToDisc
. The discrete
time variable needs to be strictly increasing for each person, because
otherwise the order of the events is not distinguishable. Here is an example
data structure in short format prior augmentation with three possible
states: \ idColumn=1, 1, ... , 1, 2, 2, ... , n \ timeColumn= t_ID1_1 <
t_ID1_1 < ... < t_ID1_k, t_ID2_1 < t_ID2_2 < ... < t_ID2_k, ... \
eventColumn = 0, 1, ... , 2, 1, 0, ... , 0
The starting state of each individual is assumed to given with time interval equals zero. For example in an illness-death model with three states ("healthy", "illness", "death") if an individual was healthy at the beginning of the study this has to be encoded with discrete time interval set to zero and event state "healthy".
Value
Original data.frame with three additional columns:
-
obj Index of persons as integer vector
-
timeInt Index of time intervals (factor or integer vector)
-
spell The spell gives the actual state of each individual within a given discrete interval.
-
e0 Response transition in long format as binary vector. Column e0 represents censoring. If e0 is coded one in the in the last observed time interval timeInt of a person, then this observation was censored.
-
e1 Response in long format as binary vector. The column e1 represents the transition to the first event state.
-
eX Response in long format as binary vector. The column eX represents the transition to the last event state out of the set of possible states "1, 2, 3, ..., X".
... Expanded columns of original data set.
Author(s)
Thomas Welchowski welchow@imbie.meb.uni-bonn.de
References
Tutz G, Schmid M (2016).
Modeling discrete time-to-event data.
Springer Series in Statistics.
Fahrmeir L (2005).
“Discrete Survival-Time Models.”
In Encyclopedia of Biostatistics, chapter Survival Analysis.
John Wiley \& Sons.
Thompson Jr. WA (1977).
“On the Treatment of Grouped Observations in Life Studies.”
Biometrics, 33, 463-470.
See Also
contToDisc
, dataLongTimeDep
,
dataLongCompRisks
, dataLongCompRisks
Examples
################################
# Example with unemployment data
data(unempMultiSpell)
# Select subsample of first 500 persons
unempSub <- unempMultiSpell[unempMultiSpell$id %in% 1:250,]
# Expansion from semi-long to long format
unempLong <- dataLongMultiSpell(dataSemiLong=unempSub, timeColumn = "year",
eventColumn="spell", idColumn="id",
spellAsFactor=TRUE, timeAsFactor=FALSE)
head(unempLong, 25)
# Fit discrete multi-state model regression model
library(VGAM)
model <- vgam(cbind(e0, e1, e2, e3, e4) ~ 0 + s(timeInt) + age:spell,
data = unempLong, family = multinomial(refLevel="e0"))
############################
# Example with artificial data
# Seed specification
set.seed(-2578)
# Construction of data set
# Censoring and three possible states (0, 1, 2, 3)
# Discrete time intervals (1, 2, ... , 10)
# Noninfluential variable x ~ N(0, 1)
datFrame <- data.frame(
ID = c(rep(1, 6), rep(2, 4), rep(3, 3), rep(4, 2), rep(5, 4),
rep(6, 5), rep(7, 7), rep(8, 8)),
time = c(c(0, 2, 5, 6, 8, 10), c(0, 1, 6, 7), c(0, 9, 10), c(0, 6), c(0, 2, 3, 4),
c(0, 3, 4, 7, 9), c(0, 2, 3, 5, 7, 8, 10), c(0, 1, 3, 4, 6, 7, 8, 9) ),
state = c(c(2, 1, 3, 2, 1, 0), c(3, 1, 2, 2), c(2, 2, 1), c(1, 2), c(3, 2, 2, 0),
c(1, 3, 2, 1, 3), c(1, 1, 2, 3, 2, 1, 3), c(3, 2, 3, 2, 1, 1, 2, 3) ),
x = rnorm(n=6+4+3+2+4+5+7+8) )
# Transformation to long format
datFrameLong <- dataLongMultiSpell(dataSemiLong=datFrame, timeColumn="time",
eventColumn="state", idColumn="ID",
spellAsFactor=TRUE)
head(datFrameLong, 25)
library(VGAM)
cRm <- vglm(cbind(e0, e1, e2, e3) ~ 0 + timeInt + x:spell,
data = datFrameLong, family = "multinomial")
summary(cRm)