R: Extracting sequences from SHP waves

seqFromWaves {seqSHP}

R Documentation

Extracting sequences from SHP waves

Description

Based on the structure of the 'SPSS' version of the Swiss Household Panel (SHP) data, the function seeks the data of variables specified by the user in each of the wave files and collects them as sequence data in a table. The function can also match the sequences with variables from other files such as the master files of persons (MP) and households (MH) and social origins (SO). It can also match with activity calendar data (CA). In addition it can extract user specified covariates from a specific wave.

Usage

seqFromWaves(
  wavedir = NULL,
  datadir = NULL,
  shpdir = NULL,
  pvarseq = NULL,
  hvarseq = NULL,
  MPvar = c("SEX", "BIRTHY"),
  SOvar = NULL,
  LJvar = NULL,
  CAvar = NULL,
  PLWvar = NULL,
  HLWvar = NULL,
  waves = NULL,
  covw = max(waves),
  maxMissing = length(waves) - 1,
  maxMissingCA = length(CAvar) - 1
)

Arguments

`wavedir`	String. Path to the SPSS SHP wave data. If `NULL`, `wavedir` is built from shpdir.
`datadir`	String. Path to the SPSS WA (All Waves) data. If `NULL`, `datadir` is built from shpdir.
`shpdir`	String. Root path of the SHP data. The path should end with the two-digits number of the last wave, e.g., `"C:/shp23"`.
`pvarseq`	Vector of strings. Protoname(s) ($$ for year) of the wanted sequence(s) of personal data.
`hvarseq`	Vector of strings. Protoname(s) ($$ for year) of the wanted sequence(s) of household data.
`MPvar`	Vector of strings. Variables to be extracted from the person master (MP) file.
`SOvar`	Vector of strings. Variables to be extracted from the social origin (SO) file.
`LJvar`	Vector of strings. Variables to be extracted from the last job (LJ) file.
`CAvar`	Vector of strings. Variables to be extracted from the activity calendar (CA) file.
`PLWvar`	Vector of strings. Variables to be extracted from the `covw` wave personal file.
`HLWvar`	Vector of strings. Variables to be extracted from the `covw` wave household file.
`waves`	Vector of integers. Selected waves (waves id number, not year!)
`covw`	Integer. Id number of wave from which to extract `PLWvar` and `HLWvar` covariates.
`maxMissing`	Integer. Maximum allowed missing states in yearly sequences (`pvarseq` and `hvarseq`).
`maxMissingCA`	Integer. Maximum allowed missing states in monthly sequences (`CAvar`).

Details

SHP data are available for free from FORS (https://forscenter.ch/projects/swiss-household-panel/data/) but require the user to accept the usage contract.

The function extracts the columns corresponding to the protonames provided from the successive wave files and collects them in a tibble table. From this table, it is then, for example, straightforward to create state sequence objects for 'TraMineR'.

When using the shpdir argument, the shpdir path must end with the two-digits number xx of the last wave. The path wavedir is then set as shpdir/SHP-Data-W1-Wxx-SPSS/ and datadir as shpdir/SHP-Data-WA-SPSS/. For example, with shpdir="C:/SHP/shp23", wavedir will be set as "C:/SHP/shp23/SHP-Data-W1-W23-SPSS/" and datadir as "C:/SHP/shp23/SHP-Data-WA-SPSS/".

The list of variable names pvarseq and hvarseq must be provided as protonames with $$ standing for the last two digits of the year.

maxMissing is set by default as length(waves) - 1, which drops cases for which one of the yearly sequence defined by pvarseq and hvarseq is empty (i.e., has no valid state). Likewise, maxMissingCA is set by default as length(CAvar) - 1 to exclude cases with an empty monthly activity calendar sequence.

The package is based on a function written in 2012 by Matthias Studer.

Value

A tibble with the selected sequence data and covariates.

Author(s)

Gilbert Ritschard

References

Swiss Household Panel documentation at https://forscenter.ch/projects/swiss-household-panel/

Examples

## Setting paths to SHP data files. Adapt to your local folders!
## It should be something like
## wavedir <- "C:/SwissHPanel/shp23/SHP-Data-W1-W23-SPSS/"
## datadir <- "C:/SwissHPanel/shp23/SHP-Data-WA-SPSS/"

## Consider first the example of 3 waves and a MP file
##  shipping with the package
wavedir <- paste0(system.file(package="seqSHP"),"/extdata/")
datadir <- wavedir

####### Working status

first.w <- 1
last.w  <- 3
waves <- first.w:last.w
maxMissing <- 2

## Sequence of categorical variables
##  WSTAT$$ is working status (WS)
shp <- seqFromWaves(wavedir, datadir,
                 pvarseq="WSTAT$$",
                 waves=waves, maxMissing=maxMissing)

## Retrieve WS labels
attr(shp$WSTAT00,"labels")

## Creating WS sequence object
library(TraMineR)
ws.shortlab <- c("AO","UN","NL")
ws.longlab <- c("Active Occupied","Unemployed","Not in Labor Force")
ws.alph <- c(1,2,3)
xtlab <- (1998+first.w):(1998+last.w)

wsvar <- getColumnIndex(shp, "WSTAT$$")
ws.seq <- seqdef(shp[, wsvar], right=NA,
                 alphabet=ws.alph, states=ws.shortlab, labels=ws.longlab,
                 cnames=xtlab)

## plotting first 100 sequences
seqIplot(ws.seq[1:100,], sort="from.start")



## Not run: 
####################################################
## To run the full examples below, you must first install SHP data
## in an accessible folder
##
## Adapt to your local folders!
wavedir <- "C:/SwissHPanel/shp23/SHP-Data-W1-W23-SPSS/"
datadir <- "C:/SwissHPanel/shp23/SHP-Data-WA-SPSS/"

####### Working status

first.w <- 2
last.w  <- 23
waves <- first.w:last.w
maxMissing <- 10

## Sequence of categorical variables
##  WSTAT$$ is working status (WS) and
##  P$$C44 satisfaction with life
shp <- seqFromWaves(wavedir, datadir,
                 pvarseq=c("WSTAT$$","P$$C44"),
                 waves=waves, maxMissing=maxMissing)

## Retrieve WS labels
attr(shp$WSTAT00,"labels")

## Creating WS sequence object
library(TraMineR)
ws.shortlab <- c("AO","UN","NL")
ws.longlab <- c("Active Occupied","Unemployed","Not in Labor Force")
ws.alph <- c(1,2,3)
xtlab <- (1998+first.w):(1998+last.w)

wsvar <- getColumnIndex(shp, "WSTAT$$")
ws.seq <- seqdef(shp[, wsvar], right=NA,
                 alphabet=ws.alph, states=ws.shortlab, labels=ws.longlab,
                 cnames=xtlab, xtstep=2, tick.last=TRUE)

seqIplot(ws.seq, sort="from.start")


######### Activity calendar from sep99 to dec2021

month.short.names <- tolower(sapply(month.name, substr, 1, 3))
xtlab.ca <- c("sep99","oct99","nov99","dec99")
for (t in 00:21) {
 xtlab.ca <- c(xtlab.ca,paste0(month.short.names, formatC(t,width=2,flag=0)))
}
names(xtlab.ca) <- xtlab.ca
ca.var <- toupper(xtlab.ca) ## SPSS variable names are uppercase

CAseqdata <- seqFromWaves(wavedir, datadir, CAvar=ca.var, maxMissingCA=36)

attr(CAseqdata$SEP99, "labels")
## First 3 columns are IDPERS, SEX, and BIRTHY. Sequences from the other columns
seqCA <- seqdef(CAseqdata[,-(1:3)], cnames=xtlab.ca, right=NA, xtstep=6, tick.last=TRUE)
seqdplot(seqCA, border=NA, with.missing=TRUE)


## End(Not run)

[Package seqSHP version 0.1.1 Index]