FormatData {gesttools}R Documentation

Formats Data Into Correct Form

Description

Takes a dataset in long format and puts it into the required format for use with the g-estimation functions. Specifically it ensures there exists a data entry for each individual at each time period, by adding empty rows, and orders the dataset by time and identifier. It can also create variables for the exposure histories of all time-varying variables in the data.

Usage

FormatData(
  data,
  idvar,
  timevar,
  An,
  varying,
  Cn = NA,
  GenerateHistory = FALSE,
  GenerateHistoryMax = NA
)

Arguments

data

A data frame in long format containing the data to be analysed.

idvar

A character string specifying the name of of the variable specifying an individuals identifier.

timevar

A character string specifying the name of the time variable. Note that time periods must be labeled as integers starting from 1 (1,2,\ldots).

An

A character string specifying the name of the exposure variable

varying

A vector of character strings specifying the names of the variables to be included in the analysis which are time-varying. Specifically the exposure, time-varying confounders and (if applicable) the time-varying outcome. If Cn is specified, it is added to varying automatically.

Cn

Optional character string specifying the name of the censoring indicator if present.

GenerateHistory

A TRUE or FALSE indicator. If set to TRUE, variables are generated corresponding to the lagged histories of all variables included in varying. These will be labeled as LagVari where Var is the variable name and i indicates how much the variable is lagged by. For example LagAn2 is the value of An, 2 time periods prior.

GenerateHistoryMax

An optional positive integer specifying GenerateHistory to generate exposure histories up to GenerateHistoryMax time periods prior.

Details

Note that any variable in varying that is strictly categorical MUST be declared as an as.factor() variable. Binary or continuous variables should be declared as an as.numeric() variable.

Value

A data frame in long format with additional rows added as necessary. If data is already in the correct format then no additional rows will be added.

Examples

data <- dataexamples(n = 1000, seed = 3456, Censoring = TRUE)$datagest
# To demonstrate the function we
# Delete the third row, corresponding to the entry for ID 1 at time 3
data <- data[-3, ]
datanew <- FormatData(
  data = data, idvar = "id", timevar = "time", An = "A",
  Cn = "C", varying = c("A", "L"), GenerateHistory = TRUE, GenerateHistoryMax = 1
)
head(datanew)
# Note that the missing entry has been re-added,
# with missing values for A and L in the third row
# An example with lagged history of time varying variables created.
data <- dataexamples(n = 1000, seed = 3456, Censoring = TRUE)$datagestmultcat
datanew <- FormatData(
  data = data, idvar = "id", timevar = "time", An = "A",
  Cn = "C", varying = c("Y","A", "L"), GenerateHistory = TRUE, GenerateHistoryMax = NA
)
head(datanew)

[Package gesttools version 1.3.0 Index]