seqformat {TraMineR} | R Documentation |
Conversion between sequence formats
Description
Convert a sequence data set from one format to another.
Usage
seqformat(data, var = NULL, from, to, compress = FALSE, nrep = NULL, tevent,
stsep = NULL, covar = NULL, SPS.in = list(xfix = "()", sdsep = ","),
SPS.out = list(xfix = "()", sdsep = ","), id = 1, begin = 2, end = 3,
status = 4, process = TRUE, pdata = NULL, pvar = NULL, limit = 100,
overwrite = TRUE, fillblanks = NULL, tmin = NULL, tmax = NULL, missing = "*",
with.missing = TRUE, right="DEL", compressed, nr)
Arguments
data |
Data frame, matrix, A data frame or a matrix with sequence data in one or more columns when
A data frame with sequence data in one or more columns when A state sequence object when |
var |
|
from |
String.
The format of the input sequence data.
It can be |
to |
String.
The format of the output data.
It can be |
compress |
Logical.
Default: |
nrep |
Integer.
The number of shifted replications when |
tevent |
Matrix.
The transition-definition matrix when |
stsep |
|
covar |
List of Integers or Strings.
The indexes or the names of additional columns in |
SPS.in |
List.
Default: |
SPS.out |
List.
Default: |
id |
When When When |
begin |
Integer or String.
Default: |
end |
Integer or String.
Default: |
status |
Integer or String.
Default: |
process |
Logical.
Default: This |
pdata |
If If A data frame containing the ID and the birth time of the individuals when
|
pvar |
List of Integers or Strings.
The indexes or names of the columns of the data frame |
limit |
Integer.
Default: |
overwrite |
Logical.
Default: |
fillblanks |
Character.
The value to fill gaps between episodes when |
tmin |
|
tmax |
|
missing |
String.
Default: |
with.missing |
Logical.
Default: |
right |
One of |
compressed |
Deprecated. Use |
nr |
Deprecated. Use |
Details
The seqformat
function is used to convert data from one format to
another. The input data is first converted into the STS format and then
converted to the output format. Depending on input and output formats, some
information can be lost in the conversion process. The output is a matrix or
a data frame, NOT a sequence stslist
object. To process, print or plot
the sequences with TraMineR functions, you will have to first transform the data frame
into a stslist
state sequence object with seqdef
.
See Gabadinho et al. (2009) and Ritschard et al. (2009) for more
details on longitudinal data formats and converting between them.
When data are in "SPELL"
format (from = "SPELL"
), the begin and end times are expected to be positions in the sequences. Therefore they should be strictly positive integers.
With process=TRUE
, the outcome sequences will be aligned on ages (process duration since birth), while with process=FALSE
they will be aligned on dates (position on the calendar time). If process=TRUE
, values in the begin
and end
columns of data
are assumed to be ages when pdata
is NULL
and integer dates otherwise. If process=FALSE
, begin and end values are assumed to be integer dates when pdata
is NULL
and ages otherwise.
To convert from person-period data use from = "SPELL"
and set both begin
and end
as the column index or name of the time variable. Alternatively, use the reshape
command of stats
, which is more efficient.
Value
A data frame for SRS
, TSE
, and SPELL
, a matrix otherwise.
When from="SPELL"
, outcome has an attribute issues
with indexes of sequences with issues (truncated sequences, missing start time, spells before birth year, ...)
Author(s)
Alexis Gabadinho, Pierre-Alexandre Fonta, Nicolas S. Müller, Matthias Studer, and Gilbert Ritschard.
References
Gabadinho, A., G. Ritschard, M. Studer and N. S. Müller (2009). Mining
Sequence Data in R
with the TraMineR
package: A user's guide.
Department of Econometrics and Laboratory of Demography, University of Geneva.
Ritschard, G., A. Gabadinho, M. Studer and N. S. Müller. Converting between various sequence representations. in Ras, Z. & Dardzinska, A. (eds.) Advances in Data Management, Springer, 2009, 223, 155-175.
See Also
Examples
## ========================================
## Examples with raw STS sequences as input
## ========================================
## Loading a data frame with sequence data in the columns 13 to 24
data(actcal)
## Converting to SPS format
actcal.SPS.A <- seqformat(actcal, 13:24, from = "STS", to = "SPS")
head(actcal.SPS.A)
## Converting to compressed SPS format with no
## prefix/suffix and with "/" as state/duration separator
actcal.SPS.B <- seqformat(actcal, 13:24, from = "STS", to = "SPS",
compress = TRUE, SPS.out = list(xfix = "", sdsep = "/"))
head(actcal.SPS.B)
## Converting to compressed DSS format
actcal.DSS <- seqformat(actcal, 13:24, from = "STS", to = "DSS",
compress = TRUE)
head(actcal.DSS)
## ==============================================
## Examples with a state sequence object as input
## ==============================================
## Loading a data frame with sequence data in the columns 10 to 25
data(biofam)
## Limiting the number of considered cases to the first 20
biofam <- biofam[1:20, ]
## Creating a state sequence object
biofam.labs <- c("Parent", "Left", "Married", "Left/Married",
"Child", "Left/Child", "Left/Married/Child", "Divorced")
biofam.short.labs <- c("P", "L", "M", "LM", "C", "LC", "LMC", "D")
biofam.seq <- seqdef(biofam, 10:25, alphabet = 0:7,
states = biofam.short.labs, labels = biofam.labs)
## Converting to SPELL format
bf.spell <- seqformat(biofam.seq, from = "STS", to = "SPELL",
pdata = biofam, pvar = c("idhous", "birthyr"))
head(bf.spell)
## ======================================
## Examples with SPELL sequences as input
## ======================================
## Loading two data frames: bfspell20 and bfpdata20
## bfspell20 contains the first 20 biofam sequences in SPELL format
## bfpdata20 contains the IDs and the years at which the
## considered individuals were aged 15
data(bfspell)
## Converting to STS format with alignement on calendar years
bf.sts.y <- seqformat(bfspell20, from = "SPELL", to = "STS",
id = "id", begin = "begin", end = "end", status = "states",
process = FALSE)
head(bf.sts.y)
## Converting to STS format with alignement on ages
bf.sts.a <- seqformat(bfspell20, from = "SPELL", to = "STS",
id = "id", begin = "begin", end = "end", status = "states",
process = TRUE, pdata = bfpdata20, pvar = c("id", "when15"),
limit = 16)
names(bf.sts.a) <- paste0("a", 15:30)
head(bf.sts.a)
## ==================================
## Examples for TSE and SPELL output
## in presence of missing values
## ==================================
data(ex1) ## STS data with missing values
## creating the state sequence object with by default
## the end missings coded as void ('%')
sqex1 <- seqdef(ex1[,1:13])
as.matrix(sqex1)
## Creating state-event transition matrices
ttrans <- seqetm(sqex1, method='transition')
tstate <- seqetm(sqex1, method='state')
## Converting into time stamped events
seqformat(sqex1, from = "STS", to = "TSE", tevent = ttrans)
seqformat(sqex1, from = "STS", to = "TSE", tevent = tstate)
## Converting into vertical spell data
seqformat(sqex1, from = "STS", to = "SPELL", with.missing=TRUE)
seqformat(sqex1, from = "STS", to = "SPELL", with.missing=TRUE, right=NA)
seqformat(sqex1, from = "STS", to = "SPELL", with.missing=FALSE)