seqFromWaves {seqSHP} | R Documentation |
Extracting sequences from SHP waves
Description
Based on the structure of the 'SPSS' version of the Swiss Household Panel (SHP) data, the function seeks the data of variables specified by the user in each of the wave files and collects them as sequence data in a table. The function can also match the sequences with variables from other files such as the master files of persons (MP) and households (MH) and social origins (SO). It can also match with activity calendar data (CA). In addition it can extract user specified covariates from a specific wave.
Usage
seqFromWaves(
wavedir = NULL,
datadir = NULL,
shpdir = NULL,
pvarseq = NULL,
hvarseq = NULL,
MPvar = c("SEX", "BIRTHY"),
SOvar = NULL,
LJvar = NULL,
CAvar = NULL,
PLWvar = NULL,
HLWvar = NULL,
waves = NULL,
covw = max(waves),
maxMissing = length(waves) - 1,
maxMissingCA = length(CAvar) - 1
)
Arguments
wavedir |
String. Path to the SPSS SHP wave data. If |
datadir |
String. Path to the SPSS WA (All Waves) data. If |
shpdir |
String. Root path of the SHP data. The path should end with the two-digits number of the last wave, e.g., |
pvarseq |
Vector of strings. Protoname(s) ($$ for year) of the wanted sequence(s) of personal data. |
hvarseq |
Vector of strings. Protoname(s) ($$ for year) of the wanted sequence(s) of household data. |
MPvar |
Vector of strings. Variables to be extracted from the person master (MP) file. |
SOvar |
Vector of strings. Variables to be extracted from the social origin (SO) file. |
LJvar |
Vector of strings. Variables to be extracted from the last job (LJ) file. |
CAvar |
Vector of strings. Variables to be extracted from the activity calendar (CA) file. |
PLWvar |
Vector of strings. Variables to be extracted from the |
HLWvar |
Vector of strings. Variables to be extracted from the |
waves |
Vector of integers. Selected waves (waves id number, not year!) |
covw |
Integer. Id number of wave from which to extract |
maxMissing |
Integer. Maximum allowed missing states in yearly sequences ( |
maxMissingCA |
Integer. Maximum allowed missing states in monthly sequences ( |
Details
SHP data are available for free from FORS (https://forscenter.ch/projects/swiss-household-panel/data/) but require the user to accept the usage contract.
The function extracts the columns corresponding to the protonames provided from the successive wave files and collects them in a tibble table. From this table, it is then, for example, straightforward to create state sequence objects for 'TraMineR'.
When using the shpdir
argument, the shpdir
path must end with the two-digits number xx
of the last wave. The path wavedir
is then set as shpdir/SHP-Data-W1-Wxx-SPSS/
and datadir
as shpdir/SHP-Data-WA-SPSS/
. For example, with shpdir="C:/SHP/shp23"
, wavedir
will be set as "C:/SHP/shp23/SHP-Data-W1-W23-SPSS/"
and datadir
as "C:/SHP/shp23/SHP-Data-WA-SPSS/"
.
The list of variable names pvarseq
and hvarseq
must be provided as protonames with $$
standing for the last two digits of the year.
maxMissing
is set by default as length(waves) - 1
, which drops cases for which one of the yearly sequence defined by pvarseq
and hvarseq
is empty (i.e., has no valid state). Likewise, maxMissingCA
is set by default as length(CAvar) - 1
to exclude cases with an empty monthly activity calendar sequence.
The package is based on a function written in 2012 by Matthias Studer.
Value
A tibble with the selected sequence data and covariates.
Author(s)
Gilbert Ritschard
References
Swiss Household Panel documentation at https://forscenter.ch/projects/swiss-household-panel/
See Also
Examples
## Setting paths to SHP data files. Adapt to your local folders!
## It should be something like
## wavedir <- "C:/SwissHPanel/shp23/SHP-Data-W1-W23-SPSS/"
## datadir <- "C:/SwissHPanel/shp23/SHP-Data-WA-SPSS/"
## Consider first the example of 3 waves and a MP file
## shipping with the package
wavedir <- paste0(system.file(package="seqSHP"),"/extdata/")
datadir <- wavedir
####### Working status
first.w <- 1
last.w <- 3
waves <- first.w:last.w
maxMissing <- 2
## Sequence of categorical variables
## WSTAT$$ is working status (WS)
shp <- seqFromWaves(wavedir, datadir,
pvarseq="WSTAT$$",
waves=waves, maxMissing=maxMissing)
## Retrieve WS labels
attr(shp$WSTAT00,"labels")
## Creating WS sequence object
library(TraMineR)
ws.shortlab <- c("AO","UN","NL")
ws.longlab <- c("Active Occupied","Unemployed","Not in Labor Force")
ws.alph <- c(1,2,3)
xtlab <- (1998+first.w):(1998+last.w)
wsvar <- getColumnIndex(shp, "WSTAT$$")
ws.seq <- seqdef(shp[, wsvar], right=NA,
alphabet=ws.alph, states=ws.shortlab, labels=ws.longlab,
cnames=xtlab)
## plotting first 100 sequences
seqIplot(ws.seq[1:100,], sort="from.start")
## Not run:
####################################################
## To run the full examples below, you must first install SHP data
## in an accessible folder
##
## Adapt to your local folders!
wavedir <- "C:/SwissHPanel/shp23/SHP-Data-W1-W23-SPSS/"
datadir <- "C:/SwissHPanel/shp23/SHP-Data-WA-SPSS/"
####### Working status
first.w <- 2
last.w <- 23
waves <- first.w:last.w
maxMissing <- 10
## Sequence of categorical variables
## WSTAT$$ is working status (WS) and
## P$$C44 satisfaction with life
shp <- seqFromWaves(wavedir, datadir,
pvarseq=c("WSTAT$$","P$$C44"),
waves=waves, maxMissing=maxMissing)
## Retrieve WS labels
attr(shp$WSTAT00,"labels")
## Creating WS sequence object
library(TraMineR)
ws.shortlab <- c("AO","UN","NL")
ws.longlab <- c("Active Occupied","Unemployed","Not in Labor Force")
ws.alph <- c(1,2,3)
xtlab <- (1998+first.w):(1998+last.w)
wsvar <- getColumnIndex(shp, "WSTAT$$")
ws.seq <- seqdef(shp[, wsvar], right=NA,
alphabet=ws.alph, states=ws.shortlab, labels=ws.longlab,
cnames=xtlab, xtstep=2, tick.last=TRUE)
seqIplot(ws.seq, sort="from.start")
######### Activity calendar from sep99 to dec2021
month.short.names <- tolower(sapply(month.name, substr, 1, 3))
xtlab.ca <- c("sep99","oct99","nov99","dec99")
for (t in 00:21) {
xtlab.ca <- c(xtlab.ca,paste0(month.short.names, formatC(t,width=2,flag=0)))
}
names(xtlab.ca) <- xtlab.ca
ca.var <- toupper(xtlab.ca) ## SPSS variable names are uppercase
CAseqdata <- seqFromWaves(wavedir, datadir, CAvar=ca.var, maxMissingCA=36)
attr(CAseqdata$SEP99, "labels")
## First 3 columns are IDPERS, SEX, and BIRTHY. Sequences from the other columns
seqCA <- seqdef(CAseqdata[,-(1:3)], cnames=xtlab.ca, right=NA, xtstep=6, tick.last=TRUE)
seqdplot(seqCA, border=NA, with.missing=TRUE)
## End(Not run)