R: Sequence Analysis Typologies Validation Using Parametric...

seqnullcqi {WeightedCluster}

R Documentation

Sequence Analysis Typologies Validation Using Parametric Bootstrap

Description

seqnullcqi implements the methodology proposed by Studer (2021) for the validation of sequence analysis typologies using parametric bootstraps. The method works by comparing the cluster quality of an observed typology with the quality obtained by clustering similar but nonclustered data. Several models to test the different structuring aspects of the sequences important in life-course research, namely, sequencing, timing, and duration (see function seqnull). This strategy allows identifying the key structural aspects captured by the observed typology. Plot and print methods of the seqnullcqi results are also provide.

Usage

seqnullcqi(seqdata, clustrange, R, model=c("combined", "duration", "sequencing", 
                    "stateindep", "Markov", "userpos"), seqdist.args=list(), 
					kmedoid = FALSE, hclust.method="ward.D", 
					parallel=FALSE, ...)
		   
## S3 method for class 'seqnullcqi'
plot(x, stat, type = c("line", "density", "boxplot", "seqdplot"),
                          quant = 0.95, norm = TRUE, legendpos = "topright",
                          alpha = 0.2, ...)

## S3 method for class 'seqnullcqi'
print(x, norm=TRUE, quant=0.95, digits=2, ...)

Arguments

`seqdata`	State sequence object of class `stslist`. The sequence data to use. Use `seqdef` to create such an object.
`clustrange`	The clustering of the data to be validated as an object of class `clustrange`. See `as.clustrange` or `wcKMedRange` to create such an object.
`model`	String. The model used to generate the similar but nonclustered data. It can be one of `"combined"`, `"duration"`, `"sequencing"`, `"stateindep"`, `"Markov"` or `"userpos"`. See `seqnull` for more information.
`R`	The number of bootstraps.
`seqdist.args`	List of arguments passed to `seqdist` for computing the distances.
`kmedoid`	Logical. If `TRUE`, the PAM algorithm is used to cluster the data using `wcKMedRange`. If `FALSE`, `hclust` is used.
`hclust.method`	String. Hierarchical method to use with `hclust`.
`x`	A `seqnullcqi` object to be plotted or printed.
`stat`	Character. The statistic to plot or "all" for all statistics. See `wcClusterQuality` for a list of possible values.
`type`	Character. The type of graphic to be plotted. If `type="line"` (default), a transparent line representing the cluster quality index for each bootstrap is plotted using a separate line. If `type="density"`, the density of the maximum cluster quality index values among the different number of groups is plotted as well as the original cluster quality values. If `type="beanplot"`, beanplot of the distribution of the cluster quality index values for each number of groups is plotted separately. If `type="seqdplot"`, a state distribution sequence plot of the sequences generated with the null model is plotted (see `seqdplot`).
`quant`	Numeric. Quantile to use for the confidence intervals.
`norm`	Logical. If `TRUE`, cluster quality indices are standardized using the mean and standard deviation of the null distribution.
`legendpos`	Character. legend position, see `legend`.
`alpha`	Transparency parameter for the lines to be drawn (only for `type="line"`).
`digits`	Number of digits to be printed.
`parallel`	Logical. Whether to initialize the parallel processing of the `future` package using the default `multisession` strategy. If `FALSE` (default), then the current `plan` is used. If `TRUE`, `multisession` `plan` is initialized using default values.
`...`	Additionnal parameters passed to `seqnull` (for `seqnullcqi`) or `plot` or `print`.

Details

The seqnullcqi function provides a validation method for sequence analysis typologies using parametric bootstraps as proposed in Studer (2021). This method works by comparing the value of the cluster quality of an observed typology with the cluster quality obtained by clustering similar but nonclustered data. More precisely it works as follows.

Cluster the observed sequence data and compute the associated cluster quality indices.
Repeat R times:
1. Generate similar but nonclustered data using a null model (see seqnull for available null models).
2. Cluster the generated data using the same distance measure and clustering algorithm as in step 1.
3. Record the quality indices values of this null clustering.
Compare the quality of the observed typology with the one obtained in the R bootstraps with the null sequence data using plot and print methods.
If the cluster quality measure of the observed typology is constantly higher than the ones obtained with null data, a “good” typology has been found.

Several null models are provided to test the different structuring aspects of the sequences important in life-course research, namely, sequencing, timing, and duration (see function seqnull and Studer, 2021). This strategy allows identifying the key structural aspects captured by the observed typology.

Value

seqnullcqi returns a "seqnullcqi" object with the following components:

`seqdata`	The sequence data generated by the null model (see `seqnull`
`stats`	The cluster quality indices for the null data.
`clustrange`	The clustering of the data to be validated as an object of class `clustrange`.
`R`	The number of bootstraps
`kmedoid`	Logical. If `TRUE`, the PAM algorithm was used to cluster the data using `wcKMedRange`.
`hclust.method`	Hierarchical method to used with `hclust`.
`seqdist.args`	List of arguments passed to `seqdist` for computing the distances.
`nullmodel`	List of arguments passed to `seqnull` to generate the sequence data under the null model.

References

Studer, M. (2021). Validating Sequence Analysis Typologies Using Parametric Bootstrap. Sociological Methodology. doi:10.1177/00811750211014232

Examples

data(biofam)

## Create the sequence object
bf.seq <- seqdef(biofam[sample.int(nrow(biofam), 100),10:25])

## Library fastcluster greatly improve computation time when using hclust
# library(fastcluster)
## Computing distances
diss <- seqdist(bf.seq, method="HAM")
## Hierarchical clustering
hc <- hclust(as.dist(diss), method="ward.D")
# Computing cluster quality measures.
clustqual <- as.clustrange(hc, diss=diss, ncluster=7)

# Compute cluster quality measure for the null model "combined"
# seqdist.args should be the same as for seqdist above except the sequence data.
# Clustering methods should be the same as above.
bcq <- seqnullcqi(bf.seq, clustqual, R=5, model=c("combined"), 
				seqdist.args=list(method="HAM"),
				hclust.method="ward.D")

# Print the results
bcq

## Different kind of plots

plot(bcq, stat="ASW", type="line")
plot(bcq, stat="ASW", type="density")
plot(bcq, stat="ASW", type="boxplot")

[Package WeightedCluster version 1.6-4 Index]