annotateData {clinDataReview}R Documentation

Annotate a dataset.

Description

Standard annotation variables are available via the parameter annotType. Custom dataset/variables of interest are specified via the annotDataset/annotVar parameters.

Usage

annotateData(
  data,
  dataPath = ".",
  annotations,
  subjectVar = "USUBJID",
  verbose = FALSE,
  labelVars = NULL,
  labelData = "data"
)

Arguments

data

Data.frame with input data to annotate.

dataPath

String with path to the data.

annotations

Annotations (or list of those) either as a:

  • string with standard annotation type, among:

    • demographics: standard variables from the demographics data (DM or ADSL) are extracted

    • exposed_subjects: a logical variable: EXFL is added to data, identifying exposed subjects, i.e. subjects included in the exposure dataset (EX/ADEX) dataset and with non empty and non missing start date ('EXSTDTC', 'STDY' or 'ASTDY')

    • functional_groups_lab: a character variable: 'LBFCTGRP' is added to data based on standard naming of the parameter code ('PARAMCD' or 'LBTESTCD' variable)

  • list of custom annotation, with:

    • (optional) annotation dataset, either:

      • 'dataset': String with name of the annotation dataset, e.g. 'ex' to import data from the file: '[dataset].sas7bdat'in dataPath

      • 'data': Data.frame with annotation dataset

      The input data is used if 'data' and 'dataset' are not specified.

    • 'vars': Either:

      • Character vector with variables of interest from annotation dataset. If not specified, all variables of the dataset are considered.

      • String with new variable name computed from varFct

    • 'varFct': (optional) Either:

      • function of data or string containing such function (e.g. 'function(data) ...')

      • string containing manipulations from column names of data (e.g. 'col1 + col2')

      used to create a new variable specified in vars.

    • 'filters': (optional) Filters for the annotation dataset, see filters parameter of filterData.
      The annotation dataset is first filtered, before being combined to the input data, such as only the records retained in the annotation dataset will be annotated in the output data. Other records will have missing values in the annotated variables.

    • 'varLabel': (optional) label for new variable in case varFct is specified.

    • 'varsBy': (optional) Character vector with variables used to merge input data and the annotation dataset. If not specified:

      • if an external dataset (dataset/data) is specified: subjectVar is used

      • otherwise: annotation dataset and input data are merged by rows IDs

subjectVar

String with subject ID variable, 'USUBJID' by default.

verbose

Logical, if TRUE (FALSE by default) progress messages are printed in the current console. For the visualizations, progress messages during download of subject-specific report are displayed in the browser console.

labelVars

Named character vector containing variable labels of data. This will be updated with the labels of the extra annotation variables (in attr(output, 'labelVars')).

labelData

(optional) String with label for input data, that will be included in progress messages.

Value

Annotated data. If labelVars is specified, the output contains an extra attribute: 'labelVars' containing updated labelVars (accessible via: in attr(output, 'labelVars')).

Examples

library(clinUtils)

data(dataADaMCDISCP01)

dataLB <- dataADaMCDISCP01$ADLBC
dataDM <- dataADaMCDISCP01$ADSL
dataAE <- dataADaMCDISCP01$ADAE

labelVars <- attr(dataADaMCDISCP01, "labelVars")

# standard annotations:
# path to dataset should be specified via: 'pathData'
## Not run: 
annotateData(dataLB, annotations = "demographics", pathData = ...)

## End(Not run)

# add all variables in annotation data (if not already available)
head(annotateData(dataLB, annotations = list(data = dataDM)), 1)

# only variables of interest
head(annotateData(dataLB, annotations = list(data = dataDM, vars = c("ARM", "ETHNIC"))), 1)

# filter annotation dataset
dataAnnotated <- annotateData(dataLB, 
	annotations = list(
		data = dataDM, 
		vars = c("ARM", "ETHNIC"), 
		filters = list(var = "ARM", value = "Placebo")
	)
)
head(subset(dataAnnotated, ARM == "Placebo"), 1)
head(subset(dataAnnotated, is.na(ARM)), 1)

# worst-case scenario: add a new variable based on filtering condition
dataAE$AESEV <- factor(dataAE$AESEV, levels = c('MILD', "MODERATE", "SEVERE"))
dataAEWC <- annotateData(
	data = dataAE,
	annotations = list(
		vars = "WORSTINT", 
		# create new variable: 'WORSTINT' 
		# with TRUE if maximum toxicity grade per subject/test 
		# (if multiple, they are all retained)
		filters = list(
			var = "AESEV", 
			# max will take latest level in a factor 
			# (so 'MODERATE' if 'MILD'/'MODERATE' are available)
			valueFct = function(x) x[which.max(as.numeric(x))],
			varsBy = c("USUBJID", "AEDECOD"),
			keepNA = FALSE,
			varNew = "WORSTINT", 
			labelNew = "worst-case"
		)
	),
	labelVars = labelVars,
	verbose = TRUE
)
attr(dataAEWC, "labelVars")["WORSTINT"]

# add a new variable based on a combination of variables:
dataLB <- annotateData(dataLB, 
	annotations = list(vars = "HILORATIO", varFct = "A1HI / A1LO")
)

# add a new variable based on extraction of a existing variable
# Note: slash should be doubled when the function is specified as text
dataLB <- annotateData(dataLB, 
	annotations = list(vars = "PERIOD", varFct = "sub('.* Week (.+)', 'Week \\\\1', AVISIT)")
)

# multiple annotations:
dataAnnotated <- annotateData(dataLB, 
	annotations = list(
		list(data = dataDM, vars = c("ARM", "ETHNIC")),
		list(data = dataAE, vars = c("AESEV"))
	)
)
head(dataAnnotated, 1)

[Package clinDataReview version 1.5.0 Index]