formatdata {SCCS} | R Documentation |
Formatting data
Description
Reformats the data based on age and/or season and exposure groups prior to fitting SCCS model.
Usage
formatdata(indiv, astart, aend, aevent, adrug, aedrug, expogrp = list(),
washout = list(), sameexpopar = list(), agegrp = NULL,
seasongrp=NULL, dob=NULL, cov = cbind(), dataformat="stack", data)
Arguments
indiv |
a vector of individual identifiers of cases |
astart |
a vector of ages at which the observation periods start |
aend |
a vector of ages at end of observation periods |
aevent |
a vector of ages at event, an individual can experience multiple events |
adrug |
a list of vectors of ages at start of exposures or a list of matrices if the exposures have multiple episodes ( |
aedrug |
a list of vectors of ages at which exposure-related risk ends or a list of matrices if there are multiple episodes (repeat exposures in different columns) of the same exposure type. The dimension of each item of |
expogrp |
list of vectors of days to the start of exposure-related risk, counted from |
washout |
list of vectors with days on start of washout periods counted from |
sameexpopar |
a vector of logical values. If TRUE (the default) no dose effect is assumed, the same exposure parameters are used for multiple doses/episodes of the same exposure type presented in |
agegrp |
a vector of cut points of the age groups where each value represents the start of an age catagory. The first element in the vector is the start of the second age group. The first age group starts at |
seasongrp |
a vector of cut points for seasonal effects. The values should be given in ddmm format, representing the first days of each season group. The seasonal effect is a factor, the reference level being the time interval starting at the earliest date in |
dob |
a vector of birth dates of the cases, in ddmmyyyy format. They are used if seasonal effects are included in the model. The default |
cov |
a vector (or a matrix if there are multiple) of fixed covariates. The default is NULL where no covariates are included. |
dataformat |
the way the input data are assembled. It accepts "multi" or "stack" (the default), where "multi" refers to a data assembled with one row representing one event and "stack" refers to a data frame where repeated exposures of the same type are stack in one column. In the "multi" dataformat different episodes of the same exposure type are recorded as separate columns in the dataframe. |
data |
a data frame containing the input data. The data should be in 'stack' or 'multi' (see |
Value
a data frame containing the following columns:
indivL |
an identfier for each individual event. |
event |
indicator for presence of an event within an interval. "1" where an event occured, "0" otherwise. |
age |
factor for age groups. |
Season |
a factor for season if |
exposures |
factors for exposure status of each exposure type. "0" for baseline/control periods, "1" for the first risk period. "1" for subsequent exposure risk periods if sameexpopar=TRUE, or increasing factor levels for each subsequent exposure if sameexpopar = FALSE. Indicators for washout periods (if there are any) are also included here. The column names of these factors are the same as the column names of the exposures in |
interval |
length of interval. Needed for offsets within the model. |
There are also columns for eventday (day of adverse event), lower (day a period starts), upper (day a period ends), indiv (original individual indentifier), aevent, astart, aend and any covariates included in cov
.
Author(s)
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
References
Whitaker, H. J., Farrington, C. P., Spiessens, B., and Musonda, P. (2006). Tutorial in biostatistics: The self-controlled case series method. Statistics in Medicine 25, 1768–1797.
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
Examples
# MMR vaccine and ITP data
# A single exposure with three risk periods and no age groups included
itp.dat1 <- formatdata(indiv=case, astart=sta, aend=end,
aevent=itp, adrug=mmr, aedrug=mmr+42,
expogrp=c(0,15,29),
data=itpdat)
itp.dat1
# A single exposure with three risk periods and six age groups
itp.dat2 <- formatdata(indiv=case, astart=sta, aend=end,
aevent=itp, adrug=mmr, aedrug=mmr+42,
expogrp=c(0,15,29), agegrp=c(427,488,549,610,671),
data=itpdat)
itp.dat2