seqnullcqi {WeightedCluster} | R Documentation |
Sequence Analysis Typologies Validation Using Parametric Bootstrap
Description
seqnullcqi
implements the methodology proposed by Studer (2021) for the validation of sequence analysis typologies using parametric bootstraps. The method works by comparing the cluster quality of an observed typology with the quality obtained by clustering similar but nonclustered data. Several models to test the different structuring aspects of the sequences important in life-course research, namely, sequencing, timing, and duration (see function seqnull
). This strategy allows identifying the key structural aspects captured by the observed typology. Plot and print methods of the seqnullcqi
results are also provide.
Usage
seqnullcqi(seqdata, clustrange, R, model=c("combined", "duration", "sequencing",
"stateindep", "Markov", "userpos"), seqdist.args=list(),
kmedoid = FALSE, hclust.method="ward.D",
parallel=FALSE, ...)
## S3 method for class 'seqnullcqi'
plot(x, stat, type = c("line", "density", "boxplot", "seqdplot"),
quant = 0.95, norm = TRUE, legendpos = "topright",
alpha = 0.2, ...)
## S3 method for class 'seqnullcqi'
print(x, norm=TRUE, quant=0.95, digits=2, ...)
Arguments
seqdata |
State sequence object of class |
clustrange |
The clustering of the data to be validated as an object of class |
model |
String. The model used to generate the similar but nonclustered data. It can be one of |
R |
The number of bootstraps. |
seqdist.args |
List of arguments passed to |
kmedoid |
Logical. If |
hclust.method |
String. Hierarchical method to use with |
x |
A |
stat |
Character. The statistic to plot or "all" for all statistics. See |
type |
Character. The type of graphic to be plotted. If |
quant |
Numeric. Quantile to use for the confidence intervals. |
norm |
Logical. If |
legendpos |
Character. legend position, see |
alpha |
Transparency parameter for the lines to be drawn (only for |
digits |
Number of digits to be printed. |
parallel |
Logical. Whether to initialize the parallel processing of the |
... |
Additionnal parameters passed to |
Details
The seqnullcqi
function provides a validation method for sequence analysis typologies using parametric bootstraps as proposed in Studer (2021). This method works by comparing the value of the cluster quality of an observed typology with the cluster quality obtained by clustering similar but nonclustered data. More precisely it works as follows.
Cluster the observed sequence data and compute the associated cluster quality indices.
Repeat
R
times:Generate similar but nonclustered data using a null model (see
seqnull
for available null models).Cluster the generated data using the same distance measure and clustering algorithm as in step 1.
Record the quality indices values of this null clustering.
Compare the quality of the observed typology with the one obtained in the
R
bootstraps with the null sequence data using plot and print methods.If the cluster quality measure of the observed typology is constantly higher than the ones obtained with null data, a “good” typology has been found.
Several null models are provided to test the different structuring aspects of the sequences important in life-course research, namely, sequencing, timing, and duration (see function seqnull
and Studer, 2021). This strategy allows identifying the key structural aspects captured by the observed typology.
Value
seqnullcqi
returns a "seqnullcqi"
object with the following components:
seqdata |
The sequence data generated by the null model (see |
stats |
The cluster quality indices for the null data. |
clustrange |
The clustering of the data to be validated as an object of class |
R |
The number of bootstraps |
kmedoid |
Logical. If |
hclust.method |
Hierarchical method to used with |
seqdist.args |
List of arguments passed to |
nullmodel |
List of arguments passed to |
References
Studer, M. (2021). Validating Sequence Analysis Typologies Using Parametric Bootstrap. Sociological Methodology. doi:10.1177/00811750211014232
See Also
See Also seqnull
for description of the null models.
Examples
data(biofam)
## Create the sequence object
bf.seq <- seqdef(biofam[sample.int(nrow(biofam), 100),10:25])
## Library fastcluster greatly improve computation time when using hclust
# library(fastcluster)
## Computing distances
diss <- seqdist(bf.seq, method="HAM")
## Hierarchical clustering
hc <- hclust(as.dist(diss), method="ward.D")
# Computing cluster quality measures.
clustqual <- as.clustrange(hc, diss=diss, ncluster=7)
# Compute cluster quality measure for the null model "combined"
# seqdist.args should be the same as for seqdist above except the sequence data.
# Clustering methods should be the same as above.
bcq <- seqnullcqi(bf.seq, clustqual, R=5, model=c("combined"),
seqdist.args=list(method="HAM"),
hclust.method="ward.D")
# Print the results
bcq
## Different kind of plots
plot(bcq, stat="ASW", type="line")
plot(bcq, stat="ASW", type="density")
plot(bcq, stat="ASW", type="boxplot")