DMTL {DMTL}  R Documentation 
Distribution Mapping based Transfer Learning
Description
This function performs distribution mapping based transfer learning (DMTL) regression for given target (primary) and source (secondary) datasets. The data available in the source domain are used to design an appropriate predictive model. The target features with unknown response values are transferred to the source domain via distribution matching and then the corresponding response values in the source domain are predicted using the aforementioned predictive model. The response values are then transferred to the original target space by applying distribution matching again. Hence, this function needs an unmatched pair of target datasets (features and response values) and a matched pair of source datasets.
Usage
DMTL(
target_set,
source_set,
use_density = FALSE,
pred_model = "RF",
model_optimize = FALSE,
sample_size = 1000,
random_seed = NULL,
all_pred = FALSE,
get_verbose = FALSE,
allow_parallel = FALSE
)
Arguments
target_set 
List containing the target datasets. A named list with
components 
source_set 
List containing the source datasets. A named list with
components 
use_density 
Flag for using kernel density as distribution estimate
instead of histogram counts. Defaults to 
pred_model 
String indicating the underlying predictive model. The currently available options are 

model_optimize 
Flag for model parameter tuning. If 
sample_size 
Sample size for estimating distributions of target and
source datasets. Defaults to 
random_seed 
Seed for random number generator (for reproducible
outcomes). Defaults to 
all_pred 
Flag for returning the prediction values in the source space.
If 
get_verbose 
Flag for displaying the progress when optimizing the
predictive model i.e., 
allow_parallel 
Flag for allowing parallel processing when performing
grid search i.e., 
Value
If all_pred = FALSE
, a vector containing the final prediction values.
If all_pred = TRUE
, a named list with two components target
and source
i.e., predictions in the original target space and in source space,
respectively.
Note
The datasets in
target_set
(i.e.,X
andy
) do not need to be matched (i.e., have the same number of rows) since the response values are used only to estimate distribution for mapping while the feature values are used for both mapping and final prediction. In contrast, the datasets insource_set
(i.e.,X
andy
) must have matched samples.It is recommended to normalize the two response values (
y
) so that they will be in the same range. If normalization is not performed,DMTL()
uses the range of targety
values as the prediction range.
Examples
set.seed(8644)
## Generate two dataset with different underlying distributions...
x1 < matrix(rnorm(3000, 0.3, 0.6), ncol = 3)
dimnames(x1) < list(paste0("sample", 1:1000), paste0("f", 1:3))
y1 < 0.3*x1[, 1] + 0.1*x1[, 2]  x1[, 3] + rnorm(1000, 0, 0.05)
x2 < matrix(rnorm(3000, 0, 0.5), ncol = 3)
dimnames(x2) < list(paste0("sample", 1:1000), paste0("f", 1:3))
y2 < 0.2*x2[, 1] + 0.3*x2[, 2]  x2[, 3] + rnorm(1000, 0, 0.05)
## Model datasets using DMTL & compare with a baseline model...
library(DMTL)
target < list(X = x1, y = y1)
source < list(X = x2, y = y2)
y1_pred < DMTL(target_set = target, source_set = source, pred_model = "RF")
y1_pred_bl < RF_predict(x_train = x2, y_train = y2, x_test = x1)
print(performance(y1, y1_pred, measures = c("MSE", "PCC")))
print(performance(y1, y1_pred_bl, measures = c("MSE", "PCC")))