seqCompare {TraMineRextras} | R Documentation |
BIC and Likelihood ratio test for comparing two sequence data
Description
The function seqCompare
computes the likelihood ratio test (LRT) and Bayesian Information Criterion (BIC) for comparing two groups within each of a series of set. The functions seqBIC
and seqLRT
are aliases that return only the BIC or the LRT.
Usage
seqCompare(seqdata, seqdata2=NULL, group=NULL, set=NULL,
s=100, seed=36963, stat="all", squared="LRTonly",
weighted=TRUE, opt=NULL, BFopt=NULL, method, ...)
seqLRT(seqdata, seqdata2=NULL, group=NULL, set=NULL, s=100,
seed=36963, squared="LRTonly", weighted=TRUE, opt=NULL,
BFopt=NULL, method, ...)
seqBIC(seqdata, seqdata2=NULL, group=NULL, set=NULL, s=100,
seed=36963, squared="LRTonly", weighted=TRUE, opt=NULL,
BFopt=NULL, method, ...)
Arguments
seqdata |
Either a state sequence object ( |
seqdata2 |
Either a state sequence object ( |
group |
Vector of length equal to number of sequences in |
set |
Vector of length equal to number of sequences in |
s |
Integer. Default 100. The size of random samples of sequences. When 0, no sampling is done. |
seed |
Integer. Default 36963. Using the same seed number guarantees the same results
each time. Set |
stat |
String. The requested statistics. One of |
squared |
Logical. Should squared distances be used? Can also be |
weighted |
Logical or String. Should weights be taken into account when available? Can also be |
opt |
Integer or |
BFopt |
Integer or |
method |
String. Method for computing sequence distances. See documentation for |
... |
Additional arguments passed to |
Details
The group
and set
arguments can only be used when seqdata
is an stslist
object (a state sequence object).
When seqdata
and seqdata2
are both provided, the LRT and BIC statistics are computed for comparing these two sets. In that case both group
and set
should be left at their default NULL
value.
When seqdata
is a list of stslist
objects, seqdata2
must be a list of the same number of stslist
objects.
The default option squared="LRTonly"
corresponds to the initial proposition of Liao and Fasang (2021). With that option, the distances to the virtual center are obtained from the pairwise non-squared dissimilarities and the resulting distances to the virtual center are squared when computing the LRT (which is in turn used to compute the BIC). With squared=FALSE
, non-squared distances are used in both cases, and with squared=TRUE
, squared distances are used in both cases.
The computation is based on the pairwise distances between the sequences. The opt
argument permits to choose between two strategies. With opt=1
, the matrix of distances is computed successively for each pair of samples of size s. When opt=2
, the matrix of distances is computed once for the observed sequences and the distances for the samples are extracted from that matrix. Option 2 is often more efficient, especially for distances based on spells. It may be slower for methods such as OM or LCS when the number of observed sequences becomes large.
Value
The function seqLRT
(and seqCompare with the default "LRT"
stat value) outputs two variables, LRT and p.LRT.
LRT |
This is the likelihood ratio test statistic for comparing the two groups. |
p.LRT |
This is the upper tail probability associated with the LRT. |
The function seqBIC
(and seqLRT
with the "BIC"
stat value) outputs two variables, BIC and BF.
BIC |
This is the difference between two BICs for comparing the two groups. |
BF |
This is the Bayes factor associated with the BIC difference. |
seqCompare
with stat="all"
outputs all four indicators.
Author(s)
Tim Liao and Gilbert Ritschard
References
Tim F. Liao & Anette E. Fasang (2021). "Comparing Groups of Life Course Sequences Using the Bayesian Information Criterion and the Likelihood Ratio Test.” Sociological Methodology, 55 (1), 44-85. doi:10.1177/0081175020959401.
Examples
## biofam data set
data(biofam)
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
"Child", "Left+Child", "Left+Marr+Child", "Divorced")
alph <- seqstatl(biofam[10:25])
## To illustrate, we use only a sample of 150 cases
set.seed(10)
biofam <- biofam[sample(nrow(biofam),150),]
biofam.seq <- seqdef(biofam, 10:25, alphabet=alph, labels=biofam.lab)
## Defining the grouping variable
lang <- as.vector(biofam[["plingu02"]])
lang[is.na(lang)] <- "unknown"
lang <- factor(lang)
## Chronogram by language group
seqdplot(biofam.seq, group=lang)
## Extracting the sequence subsets by language
lev <- levels(lang)
l <- length(lev)
seq.list <- list()
for (i in 1:l){
seq.list[[i]] <- biofam.seq[lang==lev[i],]
}
seqCompare(list(seq.list[[1]]),list(seq.list[[2]]), stat="all", method="OM", sm="CONSTANT")
seqBIC(biofam.seq, group=biofam$sex, method="HAM")
seqLRT(biofam.seq, group=biofam$sex, set=lang, s=80, method="HAM")