select_variable {TIGERr} | R Documentation |
Select variables for ensemble learning architecture
Description
This function provides an advanced option to select metabolite variables from external dataset(s). The selected variables (as a list) can be further passed to argument selectVar_external
in function run_TIGER
for a customised data correction.
Usage
select_variable(
train_num,
test_num = NULL,
train_batchID = NULL,
test_batchID = NULL,
selectVar_corType = c("cor", "pcor"),
selectVar_corMethod = c("spearman", "pearson"),
selectVar_minNum = 5,
selectVar_maxNum = 10,
selectVar_batchWise = FALSE,
coerce_numeric = FALSE
)
Arguments
train_num |
a numeric data.frame only including the metabolite values of training samples (can be quality control samples). Information such as injection order or well position need to be excluded. Row: sample. Column: metabolite variable. See Examples. |
test_num |
an optional numeric data.frame including the metabolite values of test samples (can be subject samples). If provided, the column names of |
train_batchID |
|
test_batchID |
|
selectVar_corType |
a character string indicating correlation ( |
selectVar_corMethod |
a character string indicating which correlation coefficient is to be computed. One of |
selectVar_minNum |
an integer specifying the minimum number of the selected variables. If |
selectVar_maxNum |
an integer specifying the maximum number of the selected variables. If |
selectVar_batchWise |
(advanced) logical. Specify whether the variable selection should be performed based on each batch. Default: |
coerce_numeric |
logical. If |
Details
See run_TIGER
.
Value
If selectVar_batchWise = FALSE
, the function returns a list of length one containing the selected variables computed on the whole dataset.
If selectVar_batchWise = TRUE
, a list containing the selected variables computed on different batches is returned. The length of the returned list equals the number of batch specified by test_batchID
and/or train_batchID
.
Examples
data(FF4_qc) # load demo dataset
# QC as training samples; QC1, QC2 and QC3 as test samples:
train_samples <- FF4_qc[FF4_qc$sampleType == "QC",]
test_samples <- FF4_qc[FF4_qc$sampleType != "QC",]
# Only numeric data of metabolite variables are allowed:
train_num = train_samples[-c(1:5)]
test_num = test_samples[-c(1:5)]
# If the selection is performed on the whole dataset:
# based on training samples only:
selected_var_1 <- select_variable(train_num = train_num,
test_num = NULL,
selectVar_batchWise = FALSE)
# also consider test samples:
selected_var_2 <- select_variable(train_num = train_num,
test_num = test_num,
selectVar_batchWise = FALSE)
# If the selection is based on different batches:
# (In selectVar_batchWise, batch ID is required.)
selected_var_3 <- select_variable(train_num = train_num,
test_num = NULL,
train_batchID = train_samples$plateID,
test_batchID = NULL,
selectVar_batchWise = TRUE)
# If coerce_numeric = TRUE,
# columns cannot be coerced to numeric will be removed (with warnings):
# (In this example, columns of injection order and well position are excluded.
# Because we don't want to calculate the correlations between metabolites and
# injection order/well position.)
selected_var_4 <- select_variable(train_num = train_samples[-c(4,5)],
train_batchID = train_samples$plateID,
selectVar_batchWise = TRUE,
coerce_numeric = TRUE)
identical(selected_var_3, selected_var_4) # identical to selected_var_3
## Not run:
# will throw errors if input data have non-numeric columns
# and coerce_numeric = FALSE:
selected_var_5 <- select_variable(train_num = train_samples[-c(4,5)],
coerce_numeric = FALSE)
## End(Not run)