compute_targetVal {TIGERr}R Documentation

Compute target values for ensemble learning architecture

Description

This function provides an advanced option to calculate the target values of one reference dataset (i.e. QC_num, numeric values of quality control samples). The generated target values (a list) can be further passed to argument targetVal_external in function run_TIGER such that TIGER can align the test_samples with the reference dataset. This is useful for longitudinal datasets correction and cross-kit adjustment. See case study section of our original paper for detailed explanation.

Usage

compute_targetVal(
  QC_num,
  sampleType,
  batchID = NULL,
  targetVal_method = c("mean", "median"),
  targetVal_batchWise = FALSE,
  targetVal_removeOutlier = !targetVal_batchWise,
  coerce_numeric = FALSE
)

Arguments

QC_num

a numeric data.frame including the metabolite values of quality control (QC) samples. Missing values and infinite values will not be taken into account. Row: sample. Column: metabolite variable. See Examples.

sampleType

a vector corresponding to QC_num to specify the type of each QC sample. QC samples of the same type should have the same type name. See Examples.

batchID

a vector corresponding to QC_num to specify the batch of each sample. Ignored if targetVal_batchWise = FALSE. See Examples.

targetVal_method

a character string specifying how the target values are computed. Can be "mean" (default) or "median". See Details.

targetVal_batchWise

logical. If TRUE, the target values will be computed based on each batch, otherwise, based on the whole dataset. Setting TRUE might be useful if your dataset has very obvious batch effects, but this may also make the algorithm less robust. See Details. Default: FALSE.

targetVal_removeOutlier

logical. If TRUE, outliers will be removed before the computation. Outliers are determined with 1.5 * IQR (interquartile range) rule. We recommend turning this off when the target values are computed based on batches. See Details. Default: !targetVal_batchWise.

coerce_numeric

logical. If TRUE, values in QC_num will be coerced to numeric before the computation. The columns cannot be coerced will be removed (with warnings). See Examples. Default: FALSE.

Details

See run_TIGER.

Value

If targetVal_batchWise = FALSE, the function returns a list of length one containing the target values computed on the whole dataset.

If targetVal_batchWise = TRUE, a list containing the target values computed on different batches is returned. The length of the returned list equals the number of batch specified by batchID.

Examples

data(FF4_qc) # load demo dataset
QC_num <- FF4_qc[-c(1:5)] # only contain numeric metabolite values.

# target values computed on the whole dataset:
tarVal_1 <- compute_targetVal(QC_num = QC_num,
                              sampleType = FF4_qc$sampleType,
                              batchID = FF4_qc$plateID,
                              targetVal_method = "mean",
                              targetVal_batchWise = FALSE,
                              targetVal_removeOutlier = TRUE)

# target values computed on batches:
tarVal_2 <- compute_targetVal(QC_num = QC_num,
                              sampleType = FF4_qc$sampleType,
                              batchID = FF4_qc$plateID,
                              targetVal_method = "mean",
                              targetVal_batchWise = TRUE,
                              targetVal_removeOutlier = FALSE)

# If coerce_numeric = TRUE,
# columns cannot be coerced to numeric will be removed (with warnings):
tarVal_3 <- compute_targetVal(QC_num = FF4_qc[-c(4:5)],
                              sampleType = FF4_qc$sampleType,
                              batchID = FF4_qc$plateID,
                              targetVal_method = "mean",
                              targetVal_batchWise = TRUE,
                              targetVal_removeOutlier = FALSE,
                              coerce_numeric = TRUE)
identical(tarVal_2, tarVal_3)  # identical to tarVal_2

## Not run: 

# will throw errors if input data have non-numeric columns
# and coerce_numeric = FALSE:

tarVal_4 <- compute_targetVal(QC_num = FF4_qc,
                              sampleType = FF4_qc$sampleType,
                              batchID = FF4_qc$plateID,
                              targetVal_method = "mean",
                              targetVal_batchWise = TRUE,
                              targetVal_removeOutlier = FALSE,
                              coerce_numeric = FALSE)

## End(Not run)

[Package TIGERr version 1.0.0 Index]