NMcheckData {NMdata} | R Documentation |
Check data for Nonmem compatibility or check control stream for data compatibility
Description
Check data in various ways for compatibility with Nonmem. Some findings will be reported even if they will not make Nonmem fail but because they are typical dataset issues.
Usage
NMcheckData(
data,
file,
covs,
covs.occ,
cols.num,
col.id = "ID",
col.time = "TIME",
col.dv = "DV",
col.mdv = "MDV",
col.cmt = "CMT",
col.amt = "AMT",
col.flagn,
col.row,
col.usubjid,
cols.dup,
type.data = "est",
na.strings,
return.summary = FALSE,
quiet = FALSE,
as.fun
)
Arguments
data |
The data to check. |
file |
Alternatively to checking a data object, you can use
file to specify a control stream to check. This can either be
a (working or non-working) input control stream or an output
control stream. In this case, |
covs |
columns that contain subject-level covariates. They are expected to be non-missing, numeric and not varying within subjects. |
covs.occ |
A list specifying columns that contain
subject:occasion-level covariates. They are expected to be
non-missing, numeric and not varying within combinations of
subject and occasion. |
cols.num |
Columns that are expected to be present, numeric and non-NA. If a character vector is given, the columns are expected to be used in all rows. If a column is only used for a subset of rows, use a list and name the elements by subsetting strings. See examples. |
col.id |
The name of the column that holds the subject identifier. Default is "ID". |
col.time |
The name of the column holding actual time. |
col.dv |
The name of the column holding the dependent
variable. For now, only one column can be specified, and
|
col.mdv |
The name of the column holding the binary indicator
of the dependent variable missing. Default is |
col.cmt |
The name(s) of the compartment column(s). These will be checked to be positive integers for all rows. They are also used in checks for row duplicates. |
col.amt |
The name of the dose amount column. |
col.flagn |
Optionally, the name of the column holding
numeric exclusion flags. Default value is |
col.row |
A column with a unique value for each row. Such a
column is recommended to use if possible. Default
( |
col.usubjid |
Optional unique subject identifier. It is recommended to keep a unique subject identifier (typically a character string including an abbreviated study name and the subject id) from the clinical datasets in the analysis set. If you supply the name of the column holding this identifier, NMcheckData will check that it is non-missing, that it is unique within values of col.id (i.e. that the analysis subject ID's are unique across actual subjects), and that col.id is unique within the unique subject ID (a violation of the latter is less likely). |
cols.dup |
Additional column names to consider in search of
duplicate events. |
type.data |
|
na.strings |
Strings to be accepted when trying to convert
characters to numerics. This will typically be a string that
represents missing values. Default is ".". Notice, actual
|
return.summary |
If TRUE (not default), the table summary
that is printed if |
quiet |
Keep quiet? Default is not to. |
as.fun |
The default is to return data as a
|
Details
The following checks are performed. The term "numeric" does not refer to a numeric representation in R, but compatibility with Nonmem. The character string "2" is in this sense a valid numeric, "id2" is not.
Column names must be unique and not contain special characters
If an exclusion flag is used (for ACCEPT/IGNORE in Nonmem), elements must be non-missing and integers. Notice, if an exclusion flag is found, the rest of the checks are performed on rows where that flag equals 0 (zero) only.
If a unique row identifier is found, it has to be non-missing, increasing integers.
col.time (TIME),
EVID
, col.id (ID
), col.cmt (CMT
), andcol.mdv
(MDV
): If present, elements must be non-missing and numeric.col.time (TIME) must be non-negative
-
EVID
must be in {0,1,2,3,4}. CMT must be positive integers. However, can be missing or zero for
EVID==3
.MDV must be the binary (1/0) representation of
is.na(DV)
for dosing records (EVID==0
).AMT must be 0 or
NA
forEVID
0 and 2AMT must be positive for
EVID
1 and 4DV must be numeric
DV must be missing for
EVID
in {1,4}.If found, RATE must be a numeric, equaling -2 or non-negative for dosing events.
If found, SS must be a numeric, equaling 0 or 1 for dosing records.
If found,
ADDL
must be a non-negative integer for dosing records. II must be present.If found, II must be a non-negative integer for dosing records.
ADDL
must be present.ID must be positive and values cannot be disjoint (all records for each ID must be following each other. This is technically not a requirement in Nonmem but most often an error. Use a second ID column if you deliberately want to soften this check)
TIME cannot be decreasing within ID, unless
EVID
in {3,4}.all ID's must have doses (
EVID
in {1,4})all ID's must have observations (
EVID
==0)ID's should not have leading zeros since these will be lost when Nonmem read, then write the data.
If a unique row identifier is used, this must be non-missing, increasing, integer
Character values must not contain commas (they will mess up writing/reading csv)
Columns specified in covs argument must be non-missing, numeric and not varying within subjects.
Columns specified in
covs.occ
must be non-missing, numeric and not varying within combinations of subject and occasion.Columns specified in
cols.num
must be present, numeric and non-NA
.If a unique subject identifier column (
col.usubjid
) is provided, 'col.id' must be unique within values ofcol.usubjid
and vice versa.Events should not be duplicated. For all rows, the combination of
col.id
,col.cmt
,col.evid
,col.time
plus the optional columns specified incols.dup
must be unique. In other words, if a subject (col.id
) that has say observations (col.evid
) at the same time (col.time), this is considered a duplicate. The exception is if there is a reset event (col.evid
is 3 or 4) in between the two rows. cols.dup can be used to add columns to this analysis. This is useful for different assays run on the same compartment (say a DVID column) or maybe stacked datasets. If col.cmt is of length>1, this search is repeated for each cmt column.
Value
A table with findings
Examples
## Not run:
dat <- readRDS(system.file("examples/data/xgxr2.rds", package="NMdata"))
NMcheckData(dat)
dat[EVID==0,LLOQ:=3.5]
## expecting LLOQ only for samples
NMcheckData(dat,cols.num=list(c("STUDY"),"EVID==0"=c("LLOQ")))
## End(Not run)