dataLongMultiSpell {discSurv}R Documentation

Data long transformation for multi spell analysis

Description

Transform data from short format into long format for discrete multi spell survival analysis and right censoring.

Usage

dataLongMultiSpell(
  dataSemiLong,
  timeColumn,
  eventColumn,
  idColumn,
  timeAsFactor = FALSE,
  spellAsFactor = FALSE
)

Arguments

dataSemiLong

Original data in semi-long format ("class data.frame").

timeColumn

Character giving the column name of the observed times. It is required that the observed times are discrete ("character vector").

eventColumn

Column name of the event status ("character vector"). The events can take multiple values on a discrete scale (0, 1, 2, ...) and repetition of events is allowed (integer vector or class factor). It is assumed that the number zero corresponds to censoring and all number > 0 represent the observed states between transitions.

idColumn

Name of column of identification number of persons as character("character vector").

timeAsFactor

Should the time intervals be coded as factor ("logical vector")? Default is FALSE. In the default settings the discrete time intervals are treated as quantitative ("numeric vector").

spellAsFactor

Should the spells be coded as factor ("logical vector")? Default is not to use factor. If the argument is false, the column is coded as numeric.

Details

If the data has continuous survival times, the response may be transformed to discrete intervals using function contToDisc. The discrete time variable needs to be strictly increasing for each person, because otherwise the order of the events is not distinguishable. Here is an example data structure in short format prior augmentation with three possible states: \ idColumn=1, 1, ... , 1, 2, 2, ... , n \ timeColumn= t_ID1_1 < t_ID1_1 < ... < t_ID1_k, t_ID2_1 < t_ID2_2 < ... < t_ID2_k, ... \ eventColumn = 0, 1, ... , 2, 1, 0, ... , 0

The starting state of each individual is assumed to given with time interval equals zero. For example in an illness-death model with three states ("healthy", "illness", "death") if an individual was healthy at the beginning of the study this has to be encoded with discrete time interval set to zero and event state "healthy".

Value

Original data.frame with three additional columns:

Author(s)

Thomas Welchowski welchow@imbie.meb.uni-bonn.de

References

Tutz G, Schmid M (2016). Modeling discrete time-to-event data. Springer Series in Statistics.

Fahrmeir L (2005). “Discrete Survival-Time Models.” In Encyclopedia of Biostatistics, chapter Survival Analysis. John Wiley \& Sons.

Thompson Jr. WA (1977). “On the Treatment of Grouped Observations in Life Studies.” Biometrics, 33, 463-470.

See Also

contToDisc, dataLongTimeDep, dataLongCompRisks, dataLongCompRisks

Examples


################################
# Example with unemployment data
data(unempMultiSpell)

# Select subsample of first 500 persons
unempSub <- unempMultiSpell[unempMultiSpell$id %in% 1:250,]

# Expansion from semi-long to long format
unempLong <- dataLongMultiSpell(dataSemiLong=unempSub, timeColumn = "year",
                                eventColumn="spell", idColumn="id", 
                                spellAsFactor=TRUE, timeAsFactor=FALSE)

head(unempLong, 25)

# Fit discrete multi-state model regression model
library(VGAM)

model <- vgam(cbind(e0, e1, e2, e3, e4) ~ 0 + s(timeInt) + age:spell, 
data = unempLong, family = multinomial(refLevel="e0"))
             
############################
# Example with artificial data

# Seed specification
set.seed(-2578)

# Construction of data set
# Censoring and three possible states (0, 1, 2, 3)
# Discrete time intervals (1, 2, ... , 10)
# Noninfluential variable x ~ N(0, 1)
datFrame <- data.frame(
 ID = c(rep(1, 6), rep(2, 4), rep(3, 3), rep(4, 2), rep(5, 4), 
      rep(6, 5), rep(7, 7), rep(8, 8)),
 time = c(c(0, 2, 5, 6, 8, 10), c(0, 1, 6, 7), c(0, 9, 10), c(0, 6), c(0, 2, 3, 4), 
        c(0, 3, 4, 7, 9), c(0, 2, 3, 5, 7, 8, 10), c(0, 1, 3, 4, 6, 7, 8, 9) ),
 state = c(c(2, 1, 3, 2, 1, 0), c(3, 1, 2, 2), c(2, 2, 1), c(1, 2), c(3, 2, 2, 0), 
         c(1, 3, 2, 1, 3), c(1, 1, 2, 3, 2, 1, 3), c(3, 2, 3, 2, 1, 1, 2, 3) ),
 x = rnorm(n=6+4+3+2+4+5+7+8) )

# Transformation to long format
datFrameLong <- dataLongMultiSpell(dataSemiLong=datFrame, timeColumn="time",
                                   eventColumn="state", idColumn="ID", 
                                   spellAsFactor=TRUE)
head(datFrameLong, 25)
library(VGAM)
cRm <- vglm(cbind(e0, e1, e2, e3) ~ 0 + timeInt + x:spell, 
data = datFrameLong, family = "multinomial")
summary(cRm)


[Package discSurv version 2.0.0 Index]