hbl_data {historicalborrowlong} | R Documentation |
Standardize data
Description
Standardize a tidy input dataset.
Usage
hbl_data(
data,
response,
study,
study_reference,
group,
group_reference,
patient,
rep,
rep_reference,
covariates
)
Arguments
data |
A tidy data frame or |
response |
Character of length 1,
name of the column in |
study |
Character of length 1,
name of the column in |
study_reference |
Atomic of length 1,
element of the |
group |
Character of length 1,
name of the column in |
group_reference |
Atomic of length 1,
element of the |
patient |
Character of length 1,
name of the column in |
rep |
Character of length 1,
name of the column in |
rep_reference |
Atomic of length 1,
element of the |
covariates |
Character vector of column names
in Each baseline covariate column must truly be a baseline covariate: elements must be equal for all time points within each patient (after the steps in the "Data processing" section). In other words, covariates must not be time-varying. A large number of covariates, or a large number of levels in a categorical covariate, can severely slow down the computation. Please consider carefully if you really need to include such complicated baseline covariates. |
Details
Users do not normally need to call this function. It mainly serves exposes the indexing behavior of studies and group levels to aid in interpreting summary tables.
Value
A standardized tidy data frame with one row per patient and the following columns:
-
response
: continuous response/outcome variable. (Should be change from baseline of an outcome of interest.) -
study_label
: human-readable label of the study. -
study
: integer study index with the max index equal to the current study (atstudy_reference
). -
group_label
: human-readable group label (e.g. treatment arm name). -
group
: integer group index with an index of 1 equal to the control group (atgroup_reference
). -
patient_label
: original patient ID. -
patient
: integer patient index. -
rep_label
: original rep ID (e.g. time point or patient visit). -
rep
: integer rep index. -
covariate_*
: baseline covariate columns.
Data processing
Before running the MCMC, dataset is pre-processed. This includes expanding the rows of the data so every rep of every patient gets an explicit row. So if your original data has irregular rep IDs, e.g. unscheduled visits in a clinical trial that few patients attend, please remove them before the analysis. Only the most common rep IDs should be added.
After expanding the rows, the function fills in missing values for every column except the response. That includes covariates. Missing covariate values are filled in, first with last observation carried forward, then with last observation carried backward. If there are still missing values after this process, the program throws an informative error.
See Also
Other data:
hbl_s_tau()
Examples
set.seed(0)
data <- hbl_sim_independent(n_continuous = 1, n_study = 2)$data
data <- dplyr::select(
data,
study,
group,
rep,
patient,
response,
tidyselect::everything()
)
data <- dplyr::rename(
data,
change = response,
trial = study,
arm = group,
subject = patient,
visit = rep,
cov1 = covariate_study1_continuous1,
cov2 = covariate_study2_continuous1
)
data$trial <- paste0("trial", data$trial)
data$arm <- paste0("arm", data$arm)
data$subject <- paste0("subject", data$subject)
data$visit <- paste0("visit", data$visit)
hbl_data(
data = data,
response = "change",
study = "trial",
study_reference = "trial1",
group = "arm",
group_reference = "arm1",
patient = "subject",
rep = "visit",
rep_reference = "visit1",
covariates = c("cov1", "cov2")
)