R: Plot state sequence objects

seqplot {TraMineR}

R Documentation

Plot state sequence objects

Description

High level plot functions to render state sequence objects. Can produce many different types of plots and can render sequences by group.

Usage

seqplot(seqdata,
  group = NULL,
  type = "i",
  main = "auto",
  cpal = NULL,
  missing.color = NULL,
  ylab = NULL,
  yaxis = "all",
  xaxis = "all",
  xtlab = NULL,
  cex.axis = 1,
  with.legend = "auto",
  ltext = NULL,
  cex.legend = 1,
  use.layout = (!is.null(group) | with.legend != FALSE),
  legend.prop = NA,
  rows = NA,
  cols = NA,
  title, cex.plot, withlegend, axes,
  ...)

seqdplot(seqdata, group = NULL, main = "auto", ...)
seqdHplot(seqdata, group = NULL, main = "auto", ...)
seqfplot(seqdata, group = NULL, main = "auto", ...)
seqiplot(seqdata, group = NULL, main = "auto", ...)
seqIplot(seqdata, group = NULL, main = "auto", ...)
seqHtplot(seqdata, group = NULL, main = "auto", ...)
seqmsplot(seqdata, group = NULL, main = "auto", ...)
seqmtplot(seqdata, group = NULL, main = "auto", ...)
seqrplot(seqdata, group = NULL, main = "auto", ...)
seqrfplot(seqdata, group = NULL, main = "auto", ...)

Arguments

`seqdata`	State sequence object created with the `seqdef` function.
`group`	Grouping variable of length equal to the number of sequences. When not `NULL`, a distinct plot is generated for each level of `group`.
`type`	the type of the plot. Available types are `"d"` for state distribution plots (chronograms), `"dH"` for chronograms with overlayed entropy line, `"f"` for sequence frequency plots, `"Ht"` for transversal entropy plots, `"i"` for selected sequence index plots, `"I"` for whole set index plots, `"ms"` for plotting the sequence of modal states, `"mt"` for mean times plots, `"pc"` for parallel coordinate plots, `"r"` for representative sequence plots, and `"rf"` for relative frequency plots.
`main`	Character string. Title of the graphic. Default `"auto"` prints group levels as default title when group is not `NULL` and no title otherwise. Set as `NULL` to suppress titles.
`cpal`	Color palette of the states. By default, the `cpal` attribute of the `seqdata` sequence object is used (see `seqdef`). If user specified, a vector of colors of length and order corresponding to `alphabet(seqdata)`.
`missing.color`	Color for representing missing values inside the sequences. By default, this color is taken from the `missing.color` attribute of `seqdata`.
`ylab`	Character string or vector of strings. Optional label of the y-axis. If a vector, y-axis label of each group level. If set as `NA`, no label is drawn.
`yaxis`	Logical or one of `"all"` or `"left"`. If set as `TRUE` or `"all"` (default value), sequence index numbers are displayed for `"i"` and `"I"`, mean time values for `"mt"`, percentages for `"d"` and `"f"`, and state/event labels for `"pc"`. Ignored for `"r"`. If `"left"` and `group` is used, the y-axis is displayed on plots of the left panel only. If `FALSE` no y-axis is drawn. For type `"f"`, can also be one of `"pct"` or `"left.pct"`.
`xaxis`	Logical or one of `"all"` or `"bottom"`. If set as `TRUE` or `"all"` (default value) x-axes are drawn for each plot in the graphic. If set as `"bottom"` and `group` is used, axes are drawn under the plots of the bottom panel only. If `FALSE`, no x-axis is drawn.
`xtlab`	Vector of length equal to the number of columns of `seqdata`. Optional x-axis tick labels. If unspecified, column names of the `seqdata` sequence object are used (see `seqdef`).
`cex.axis`	Real value. Axis annotation magnification. When `type = "r"` and for `seqrplot()`, it also determines the magnification of the plotted text and symbols. See `par`.
`with.legend`	Character string or logical. Defines if and where the legend of the state colors is plotted. The default value `"auto"` sets the position of the legend automatically. Other possible value is `"right"`. Obsolete value `TRUE` is equivalent to `"auto"`.
`ltext`	Vector of character strings of length and order corresponding to `alphabet(seqdata)`. Optional description of the states to appear in the legend. If unspecified, the `label` attribute of the `seqdata` sequence object is used (see `seqdef`).
`cex.legend`	Real. Legend magnification. See `legend`.
`use.layout`	Logical. Should `layout` be used to arrange plots when using the group option or plotting a legend? When layout is activated, the standard '`par(mfrow=....)`' for arranging plots does not work. With `with.legend=FALSE` and `group=NULL`, layout is automatically deactivated and '`par(mfrow=....)`' can be used.
`legend.prop`	Real in range [0,1]. Proportion of the graphic area devoted to the legend plot when `use.layout=TRUE` and `with.legend=TRUE`. Default value is set according to the place (bottom or right of the graphic area) where the legend is plotted.
`rows`, `cols`	Integers. Number of rows and columns of the plot panel when `use.layout=TRUE`.
`title`	Deprecated. Use `main` instead.
`cex.plot`	Deprecated. Use `cex.axis` instead.
`withlegend`	Deprecated. Use `with.legend` instead.
`axes`	Deprecated. Use `xaxis` instead.
`...`	arguments to be passed to the function called to produce the appropriate statistics and the associated plot method (see details), or other graphical parameters. For example, the `weighted` argument can be passed to control whether (un)weighted statistics are produced, and `with.missing=TRUE` to take missing values into account when computing cross-sectional or longitudinal state distributions. Can also include arguments of `legend` such as `bty="n"` to suppress the box surrounding the legend.

Details

seqplot is the generic function for high level plots of state sequence objects with group splits and automatic display of the color legend. Many different types of plots can be produced by means of the type argument. Except for sequence index plots, seqplot first calls the specific function producing the required statistics and then the plot method for objects produced by this function (see below). For sequence index plots, the state sequence object itself is plotted by calling the plot.stslist method. When splitting by groups and/or displaying the color legend, the layout function is used for arranging the plots.

The seqdplot, seqdHplot, seqfplot, seqiplot, seqIplot, seqHtplot, seqmsplot, seqmtplot, seqpcplot and seqrplot functions are aliases for calling seqplot with type argument set respectively to "d", "dH", "f", "i", "I", "Ht", "ms", "mt", "pc" or "r".

A State distribution plot (type="d") represents the sequence of the cross-sectional state frequencies by position (time point) computed by the seqstatd function and rendered with the plot.stslist.statd method. Such plots are also known as chronograms.

A Sequence frequency plots (type="f") displays the most frequent sequences, each one with an horizontal stack bar of its successive states. Sequences are displayed bottom-up in decreasing order of their frequencies (computed by the seqtab function). The plot.stslist.freq plot method is called for producing the plot.
The idxs optional argument may be specified for selecting the sequences to be plotted (default is 1:10, i.e. the 10 most frequent sequences). The width of the bars representing the sequences is by default proportional to their frequencies, but this can be disabled with the pbarw=FALSE optional argument. If weights have been specified when creating seqdata, weighted frequencies are used unless you set the weighted=TRUE option. See examples below, the seqtab and plot.stslist.freq manual pages for a complete list of optional arguments and Müller et al., (2008) for a description of sequence frequency plots.

In sequence index plots (type="i" or type="I"), the requested individual sequences are rendered with horizontal stacked bars depicting the states over successive positions (time). Optional arguments are idxs for specifying the indexes of the sequences to be plotted (when type="i" defaults to the first ten sequences, i.e idxs=1:10). For nicely plotting a (large) whole set of sequences, use type="I" which is type="i" with idxs=0 and the additional graphical parameters border=NA and space=0 to suppress bar borders and space between bars. The sortv argument can be used to pass a vector of numerical values for sorting the sequences or to specify a sorting method. See plot.stslist for a complete list of optional arguments and their description.

The interest of sequence index plots has, for instance, been stressed by Scherer (2001) and Brzinsky-Fay et al. (2006). Notice that index plots for thousands of sequences result in very heavy PDF or POSTSCRIPT graphic files. Dramatic file size reduction may be achieved by saving the figures in bitmap format by using for instance the png graphic device instead of postscript or pdf.

The transversal entropy plot (type="Ht") displays the evolution over positions of the cross-sectional entropies (Billari, 2001). Cross-sectional entropies are computed by calling seqstatd function and then plotted with the plot.stslist.statd plot method. With type="dH", the entropy line is overlayed on the state distribution plot. Due to argument name conflict, use col.entr= to set the color of the overlayed entropy curve (col argument of plot.stslist.statd).

The modal state sequence plot (type="ms") displays the sequence of the modal states with each mode proportional to its frequency at the given position. The seqmodst function is called which returns the sequence and the result is plotted by calling the plot.stslist.modst plot method.

The mean time plot (type="mt") displays the mean time spent in each state of the alphabet as computed by the seqmeant function. The plot.stslist.meant plot method is used to plot the resulting statistics. Set serr=TRUE to display error bars on the mean time plot. Bar labels can be specified by passing the bar.labels among the ... arguments. In that case, bar.labels must be either a matrix with group specific labels in columns or a single vector to display the same labels for all groups.

The representative sequence plot (type="r") displays a reduced, non redundant set of representative sequences extracted from the provided state sequence object and sorted according to a representativeness criterion. The seqrep function is called to extract the representative set which is then plotted by calling the plot.stslist.rep method. A distance matrix is required that is passed with the diss argument or by calling the seqdist function if diss=NULL. The criterion argument sets the representativeness criterion used to sort the sequences. Refer to the seqrep and plot.stslist.rep manual pages for a complete list of optional arguments. See Gabadinho and Ritschard (2013) for more details on the extraction of representative sets. Also look at the examples below.

Relative frequency plot (type="rf") displays the medoids of equal sized groups Fasang and Liao (2014). The partition into equal sized groups and the identification of the medoids is done by calling seqrf and plots are generated by plot.seqrf. See these functions for possible options. Option which.plot = "both" applies only when group = NULL. Whatever the value of info, seqplot does not display the statistics on the plot. When sortv="mds" is set, the first MDS factor of the whole diss matrix is computed and used for sorting each group. Set sortv=NULL to use the original data order.

For decorated parallel coordinate plots (type="pc") see the specific manual page of seqpcplot.

Author(s)

Alexis Gabadinho and Gilbert Ritschard

References

Billari, F. C. (2001). The analysis of early life courses: Complex description of the transition to adulthood. Journal of Population Research 18(2), 119-142.

Brzinsky-Fay C., U. Kohler, M. Luniak (2006). Sequence Analysis with Stata. The Stata Journal, 6(4), 435-460.

Fasang, A.E. and T.F. Liao. (2014). Visualizing Sequences in the Social Sciences: Relative Frequency Sequence Plots. Sociological Methods and Research 43(4), 643-676.

Gabadinho, A., and G. Ritschard (2013), "Searching for typical life trajectories applied to childbirth histories", In Levy, R. & Widmer, E. (eds) Gendered life courses - Between individualization and standardization. A European approach applied to Switzerland, pp. 287-312. Vienna: LIT.

Gabadinho, A., G. Ritschard, N.S. Müller and M. Studer (2011). Analyzing and Visualizing State Sequences in R with TraMineR. Journal of Statistical Software 40(4), 1-37.

Gabadinho A., G. Ritschard, M. Studer, N.S. Müller (2011). "Extracting and Rendering Representative Sequences", In A Fred, JLG Dietz, K Liu, J Filipe (eds.), Knowledge Discovery, Knowledge Engineering and Knowledge Management, volume 128 of Communications in Computer and Information Science (CCIS), pp. 94-106. Springer-Verlag.

Müller, N.S., A. Gabadinho, G. Ritschard and M. Studer (2008). Extracting knowledge from life courses: Clustering and visualization. In Data Warehousing and Knowledge Discovery, 10th International Conference DaWaK 2008, Turin, Italy, September 2-5, LNCS 5182, Berlin: Springer, 176-185.

Scherer S (2001). Early Career Patterns: A Comparison of Great Britain and West Germany. European Sociological Review, 17(2), 119-144.

Examples

## ======================================================
## Creating state sequence objects from example data sets
## ======================================================

## biofam data set
data(biofam)
## We use only a sample of 300 cases
set.seed(10)
biofam <- biofam[sample(nrow(biofam),300),]
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
                "Child", "Left+Child", "Left+Marr+Child", "Divorced")
biofam.seq <- seqdef(biofam, 10:25, labels=biofam.lab)

## actcal data set
data(actcal)
## We use only a sample of 300 cases
set.seed(1)
actcal <- actcal[sample(nrow(actcal),300),]
actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work")
actcal.seq <- seqdef(actcal,13:24,labels=actcal.lab)

## ex1 using weights
data(ex1)
ex1.seq <- seqdef(ex1, 1:13, weights=ex1$weights)

## ====================
## Sequence index plots
## ====================

## First ten sequences
seqiplot(biofam.seq)

## All sequences sorted by age in 2000
## grouped by sex
seqIplot(actcal.seq, group=actcal$sex, sortv=actcal$age00)


## =======================
## State distribution plot
## =======================

## biofam grouped by sex
seqplot(biofam.seq, type="d", group=actcal$sex)

## actcal grouped by sex
seqplot(actcal.seq, type="d", group=actcal$sex)

## with overlayed entropy line
seqplot(actcal.seq, type="dH", group=actcal$sex)

## ===================
## Cross-sectional entropy plot
## ===================
seqplot(biofam.seq, type="Ht", group=biofam$sex)

## ========================
## Sequence frequency plots
## ========================

## Plot of the 10 most frequent sequences
seqplot(biofam.seq, type="f")

## Grouped by sex
seqfplot(actcal.seq, group=actcal$sex)

## Unweighted vs weighted frequencies
seqfplot(ex1.seq, weighted=FALSE)
seqfplot(ex1.seq, weighted=TRUE)

## =====================
## Modal states sequence
## =====================
seqplot(biofam.seq, type="ms")
## same as
seqmsplot(biofam.seq)

## ====================
## Representative plots
## ====================

## Computing a distance matrix
## with OM metric
costs <- seqcost(actcal.seq, method="INDELSLOG")
actcal.om <- seqdist(actcal.seq, method="OM", sm=costs$sm, indel=costs$indel)

## Plot of the representative sets grouped by sex
## using the default density criterion
seqrplot(actcal.seq, group=actcal$sex, diss=actcal.om, coverage=.5)

## Plot of the representative sets grouped by sex
## using the "dist" (centrality) criterion
seqrplot(actcal.seq, group=actcal$sex, criterion="dist", diss=actcal.om, coverage=.33)

## ========================
## Relative frequency plots
## ========================
## Using default sorting by first MDS variable
seqrfplot(actcal.seq, diss=actcal.om, sortv=NULL, group=actcal$sex)


## ===============
## Mean time plot
## ===============

## actcal data set, grouped by sex
seqplot(actcal.seq, type="mt", group=actcal$sex)

## displaying mean times as bar labels
group <- factor(actcal$sex)
blab <- NULL
for (i in 1:length(levels(group))){
  blab <- cbind(blab,seqmeant(actcal.seq[group==levels(group)[i],]))
}
seqmtplot(actcal.seq, group=group,
          bar.labels = round(blab,digits=2), cex.barlab=1.2)

[Package TraMineR version 2.2-10 Index]