FormatData {gesttools} | R Documentation |
Formats Data Into Correct Form
Description
Takes a dataset in long format and puts it into the required format for use with the g-estimation functions. Specifically it ensures there exists a data entry for each individual at each time period, by adding empty rows, and orders the dataset by time and identifier. It can also create variables for the exposure histories of all time-varying variables in the data.
Usage
FormatData(
data,
idvar,
timevar,
An,
varying,
Cn = NA,
GenerateHistory = FALSE,
GenerateHistoryMax = NA
)
Arguments
data |
A data frame in long format containing the data to be analysed. |
idvar |
A character string specifying the name of of the variable specifying an individuals identifier. |
timevar |
A character string specifying the name of the time variable.
Note that time periods must be labeled as integers starting from 1
( |
An |
A character string specifying the name of the exposure variable |
varying |
A vector of character strings specifying the names of the variables
to be included in the analysis which are time-varying. Specifically
the exposure, time-varying confounders and (if applicable) the time-varying outcome.
If |
Cn |
Optional character string specifying the name of the censoring indicator if present. |
GenerateHistory |
A TRUE or FALSE indicator. If set to TRUE, variables are generated
corresponding to the lagged histories of all variables included in |
GenerateHistoryMax |
An optional positive integer specifying |
Details
Note that any variable in varying
that is strictly categorical MUST be declared as
an as.factor()
variable. Binary or continuous variables should be declared as an
as.numeric()
variable.
Value
A data frame in long format with additional rows added as necessary. If
data
is already in the correct format then no additional rows will be added.
Examples
data <- dataexamples(n = 1000, seed = 3456, Censoring = TRUE)$datagest
# To demonstrate the function we
# Delete the third row, corresponding to the entry for ID 1 at time 3
data <- data[-3, ]
datanew <- FormatData(
data = data, idvar = "id", timevar = "time", An = "A",
Cn = "C", varying = c("A", "L"), GenerateHistory = TRUE, GenerateHistoryMax = 1
)
head(datanew)
# Note that the missing entry has been re-added,
# with missing values for A and L in the third row
# An example with lagged history of time varying variables created.
data <- dataexamples(n = 1000, seed = 3456, Censoring = TRUE)$datagestmultcat
datanew <- FormatData(
data = data, idvar = "id", timevar = "time", An = "A",
Cn = "C", varying = c("Y","A", "L"), GenerateHistory = TRUE, GenerateHistoryMax = NA
)
head(datanew)