seqsamm {TraMineRextras} | R Documentation |
Sequence Analysis Multistate Model (SAMM) procedure
Description
Sequence Analysis Multistate Model (SAMM) procedure aims to simultaneously study the occurrence of transitions out of (an exit from) a spell in a given state along trajectories and the subsequence (or subtrajectory) immediately following it over a pre-defined period of time. This strategy allows including time-varying covariates in the sequence analysis framework.
Usage
seqsamm(seqdata, sublength, covar = NULL)
## S3 method for class 'SAMM'
plot(x, type="d", ...)
seqsammseq(samm, spell)
seqsammeha(samm, spell, typology, persper = TRUE)
Arguments
seqdata |
State sequence object created with the |
sublength |
Numeric. The length of the subsequence (or subtrajectory) following a transition to be considered. |
covar |
Optional |
x |
A SAMM object produced by |
samm |
A SAMM object produced by |
type |
the type of the plot |
spell |
Character. The (ending) spell in a given spell to consider. It should be one of the states of the |
typology |
Factor or character. The typology of the trajectories out of the specified ending |
persper |
Logical. If |
... |
additional plot parameters passed to |
Details
The Sequence Analysis Multistate Model (SAMM) procedure works in three steps. First, the substrings over a given time span sublength
following any transition out of (exit from) a spell in a given state of the alphabet are extracted from the trajectories seqdata
. This step is achieved using the seqsamm
function. Each substring starts with the last time-point of the spell in the state. Second, these substrings are clustered using SA to identify typical substrings of medium-term changes. This is achieved separately for each ending spell (see spell
argument). The seqsammseq
function can be used to retrieve the sub-trajectories following each ending spell. Third, multistate models are used to estimate the chance (or risk) to end a spell in a given spell by distinguishing the type of trajectory that follows (and identified with cluster analysis). This allows estimating the effect of covariates on the chances to start each type of sub-sequence. The seqsammeha
prepare the data to estimate the competing risk models for each ending spell. Then usual competing risks models can be used.
Generally speaking, the SAMM procedure allows studying the time spent in each state as well as the patterns of medium-term changes after an exit from that state appears along the trajectories. The example section below provides a step by step example of how to use it.
Value
A SAMM
object (data.frame), storing the reorganized data in person period form. Column variables are:
id |
Numeric. The ID of the observation as the row number in the original |
time |
Numeric. The time unit of the current observation (from the beginning of the original sequence). |
begin |
Numeric. The time of the beginning of the current spell (from the beginning of the original sequence). |
spell.time |
Numeric. The time elapsed from the beginning of the current spell. |
transition |
Logical. Whether a transition out of the current spell occurred within this time unit. |
s.1 until s.sublength |
The state sequence following the current observation starting from 1 (current state) until |
lastobs |
Logical. Whether this is the last observation of the current spell, censored or not. This is useful when one wants only one row per individual, for instance to plot survival curves (see example). |
x |
object of class |
Optional covariate list |
The covariates provided with the |
The function seqsammseq
returns an stslist
sequence object (see seqdef
) of the trajectories following an ending spell.
The function seqsammeha
returns a data.frame
storing the person period data of a specific ending spell
(see spell
argument) considering the given typology
as competing risks (see typology
argument). Several variables are added to the SAMM
objects (see above):
SAMMtypology |
Factor. The events ending the specified spell using |
SAMM... |
Logical. A logical vector specifying whether the current observation ends the spell with the following |
Author(s)
Matthias Studer
References
Studer, M., Struffolino, E., & Fasang, A. E. (2018). Estimating the Relationship between Time-varying Covariates and Trajectories: The Sequence Analysis Multistate Model Procedure. Sociological Methodology, 48(1), 103–135. doi:10.1177/0081175017747122
See Also
Examples
data(mvad)
mvad.seq <- seqdef(mvad, 17:86)
## For sake of simplicity we recode all "education" states to only one common state.
mvad.seq <- seqrecode(mvad.seq, list("education"=c("FE", "HE", "school", "training")))
## We now have three states
seqdplot(mvad.seq)
###########################################################################
## STEP I: Subsequence extraction
###########################################################################
## We start by extracting all subsequence of length 6
## We also add covariates from the mvad data frame
mvad.samm <- seqsamm(mvad.seq, 6, covar=mvad[, c("Grammar", "funemp", "gcse5eq")])
## Plotting the results to visualize the transitions out of each states.
plot(mvad.samm)
## Descriptive information on the seqsamm object
summary(mvad.samm)
###########################################################################
### STEP II: Typology of trajectory out of joblessness
###########################################################################
## We retrieve the subsequences following a transition out of a joblessness spell
jlseq <- seqsammseq(mvad.samm, "joblessness")
## Now we create a typology of these subsequences.
## Compute the clustering using LCS
jldist <- seqdist(jlseq, method="LCS")
## For sake of simplicity, use only 2 groups
library(cluster)
jlclust <- pam(jldist, diss=TRUE, k=2, cluster.only=TRUE)
## Specify the names of the types in the 2-cluster typology (here joblessness1 or joblessness2).
jltype <- paste0("joblessness", jlclust)
###########################################################################
### STEP III: Competing risks model of trajectories out of joblessness
###########################################################################
## Get the data to estimate competing risks models of the kind of trajectory
## out of jobjlessness
## We specify the SAMM object, the ending spell (joblessness) and our typology.
jleha <- seqsammeha(mvad.samm, "joblessness", jltype)
## Not run:
## Now jleha stores the data in person period format for competing risks
## Discrete time model using multinomial regression
## SAMMtypology and spell.time are variables created and stored in the jleha dataset
library(nnet)
multinom(SAMMtypology~spell.time+Grammar+funemp+gcse5eq, data=jleha)
## We can also have only one line per ending spell
## Plot the results
library(survival)
jleha <- seqsammeha(mvad.samm, "joblessness", jltype, persper=FALSE)
plot(survfit(Surv(spell.time, SAMMjoblessness1)~1, data=jleha))
## Cox model
summary(coxph(Surv(spell.time, SAMMjoblessness1)~gcse5eq+Grammar+funemp, data=jleha))
## Most of the time methods for recurrent events should be used.
## See for instance the coxme library to do so.
library(coxme)
summary(coxme(Surv(spell.time, SAMMjoblessness1)~gcse5eq+Grammar+funemp+(1|id), data=jleha))
## End(Not run)
###########################################################################
### Now repeat steps II and III for employment and then education
### (Not shown here)
###########################################################################