NMcheckData {NMdata}R Documentation

Check data for Nonmem compatibility or check control stream for data compatibility

Description

Check data in various ways for compatibility with Nonmem. Some findings will be reported even if they will not make Nonmem fail but because they are typical dataset issues.

Usage

NMcheckData(
  data,
  file,
  covs,
  covs.occ,
  cols.num,
  col.id = "ID",
  col.time = "TIME",
  col.dv = "DV",
  col.mdv = "MDV",
  col.cmt = "CMT",
  col.amt = "AMT",
  col.flagn,
  col.row,
  col.usubjid,
  cols.dup,
  type.data = "est",
  na.strings,
  return.summary = FALSE,
  quiet = FALSE,
  as.fun
)

Arguments

data

The data to check. data.frame, data.table, tibble, anything that can be converted to data.table.

file

Alternatively to checking a data object, you can use file to specify a control stream to check. This can either be a (working or non-working) input control stream or an output control stream. In this case, NMdataCheck checks column names in data against control stream (see NMcheckColnames), reads the data as Nonmem would do, and do the same checks on the data as NMdataCheck would do using the data argument. col.flagn is ignored in this case - instead, ACCEPT/IGNORE statements in control stream are applied. The file argument is useful for debugging a Nonmem model.

covs

columns that contain subject-level covariates. They are expected to be non-missing, numeric and not varying within subjects.

covs.occ

A list specifying columns that contain subject:occasion-level covariates. They are expected to be non-missing, numeric and not varying within combinations of subject and occasion. covs.occ=list(PERIOD=c("FED")) means that FED is the covariate, while PERIOD indicates the occasion.

cols.num

Columns that are expected to be present, numeric and non-NA. If a character vector is given, the columns are expected to be used in all rows. If a column is only used for a subset of rows, use a list and name the elements by subsetting strings. See examples.

col.id

The name of the column that holds the subject identifier. Default is "ID".

col.time

The name of the column holding actual time.

col.dv

The name of the column holding the dependent variable. For now, only one column can be specified, and MDV is assumed to match this column. Default is DV.

col.mdv

The name of the column holding the binary indicator of the dependent variable missing. Default is MDV.

col.cmt

The name(s) of the compartment column(s). These will be checked to be positive integers for all rows. They are also used in checks for row duplicates.

col.amt

The name of the dose amount column.

col.flagn

Optionally, the name of the column holding numeric exclusion flags. Default value is FLAG and can be configured using NMdataConf. Even though FLAG is the default value, no finding will be returned if the column is missing unless explicitly defined as col.flagn="FLAG". This is because this way of using exclusion flags is only one of many ways you could choose to handle exclusions. Disable completely by using col.flagn=FALSE.

col.row

A column with a unique value for each row. Such a column is recommended to use if possible. Default ("ROW") can be modified using NMdataConf.

col.usubjid

Optional unique subject identifier. It is recommended to keep a unique subject identifier (typically a character string including an abbreviated study name and the subject id) from the clinical datasets in the analysis set. If you supply the name of the column holding this identifier, NMcheckData will check that it is non-missing, that it is unique within values of col.id (i.e. that the analysis subject ID's are unique across actual subjects), and that col.id is unique within the unique subject ID (a violation of the latter is less likely).

cols.dup

Additional column names to consider in search of duplicate events. col.id, col.cmt, col.evid, and col.time are always considered if found in data, and cols.dup is added to this list if provided.

type.data

"est" for estimation data (default), and "sim" for simulation data. Differences are that col.row is not expected for simulation data, and subjects will be checked to have EVID==0 rows for estimation data and EVID==2 rows for simulation data.

na.strings

Strings to be accepted when trying to convert characters to numerics. This will typically be a string that represents missing values. Default is ".". Notice, actual NA, i.e. not a string, is allowed independently of na.strings. See ?NMisNumeric.

return.summary

If TRUE (not default), the table summary that is printed if quiet=FALSE is returned as well. In that case, a list is returned, and the findings are in an element called findings.

quiet

Keep quiet? Default is not to.

as.fun

The default is to return data as a data.frame. Pass a function (say tibble::as_tibble) in as.fun to convert to something else. If data.tables are wanted, use as.fun="data.table". The default can be configured using NMdataConf.

Details

The following checks are performed. The term "numeric" does not refer to a numeric representation in R, but compatibility with Nonmem. The character string "2" is in this sense a valid numeric, "id2" is not.

Value

A table with findings

Examples

## Not run: 
dat <- readRDS(system.file("examples/data/xgxr2.rds", package="NMdata"))
NMcheckData(dat)
dat[EVID==0,LLOQ:=3.5]
## expecting LLOQ only for samples
NMcheckData(dat,cols.num=list(c("STUDY"),"EVID==0"=c("LLOQ")))

## End(Not run)

[Package NMdata version 0.1.6 Index]