source_detection {glmtrans}R Documentation

Transferable source detection for GLM transfer learning algorithm.

Description

Detect transferable sources from multiple source data sets. Currently can deal with Gaussian, logistic and Poisson models.

Usage

source_detection(
  target,
  source = NULL,
  family = c("gaussian", "binomial", "poisson"),
  alpha = 1,
  standardize = TRUE,
  intercept = TRUE,
  nfolds = 10,
  cores = 1,
  valid.nfolds = 3,
  lambda = "lambda.1se",
  detection.info = TRUE,
  target.weights = NULL,
  source.weights = NULL,
  C0 = 2,
  ...
)

Arguments

target

target data. Should be a list with elements x and y, where x indicates a predictor matrix with each row/column as a(n) observation/variable, and y indicates the response vector.

source

source data. Should be a list with some sublists, where each of the sublist is a source data set, having elements x and y with the same meaning as in target data.

family

response type. Can be "gaussian", "binomial" or "poisson". Default = "gaussian".

  • "gaussian": Gaussian distribution.

  • "binomial": logistic distribution. When family = "binomial", the input response in both target and source should be 0/1.

  • "poisson": poisson distribution. When family = "poisson", the input response in both target and source should be non-negative.

alpha

the elasticnet mixing parameter, with 0 \leq \alpha \leq 1. The penality is defined as

(1-\alpha)/2||\beta||_2^2+\alpha ||\beta||_1

. alpha = 1 encodes the lasso penalty while alpha = 0 encodes the ridge penalty. Default = 1.

standardize

the logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is TRUE.

intercept

the logical indicator of whether the intercept should be fitted or not. Default = TRUE.

nfolds

the number of folds. Used in the cross-validation for GLM elastic net fitting procedure. Default = 10. Smallest value allowable is nfolds = 3.

cores

the number of cores used for parallel computing. Default = 1.

valid.nfolds

the number of folds used in cross-validation procedure when detecting transferable sources. Useful only when transfer.source.id = "auto". Default = 3.

lambda

lambda (the penalty parameter) used in the transferable source detection algorithm. Can be either "lambda.min" or "lambda.1se". Default = "lambda.1se".

detection.info

the logistic flag indicating whether to print detection information or not. Useful only when transfer.source.id = "auto". Default = TURE.

target.weights

weight vector for each target instance. Should be a vector with the same length of target response. Default = NULL, which makes all instances equal-weighted.

source.weights

a list of weight vectors for the instances from each source. Should be a list with the same length of the number of sources. Default = NULL, which makes all instances equal-weighted.

C0

the constant used in the transferable source detection algorithm. See Algorithm 2 in Tian, Y. and Feng, Y., 2021. Default = 2.

  • "lambda.min": value of lambda that gives minimum mean cross-validated error in the sequence of lambda.

  • "lambda.1se": largest value of lambda such that error is within 1 standard error of the minimum.

...

additional arguments.

Value

An object with S3 class "glmtrans_source_detection".

target.valid.loss

the validation (or cross-validation) loss on target data. Only available when transfer.source.id = "auto".

source.loss

the loss on each source data. Only available when transfer.source.id = "auto".

threshold

the threshold to determine transferability. Only available when transfer.source.id = "auto".

Note

source.loss and threshold outputed by source_detection can be visualized by function plot.glmtrans.

References

Tian, Y. and Feng, Y., 2021. Transfer Learning under High-dimensional Generalized Linear Models. arXiv preprint arXiv:2105.14328.

Li, S., Cai, T.T. and Li, H., 2020. Transfer learning for high-dimensional linear regression: Prediction, estimation, and minimax optimality. arXiv preprint arXiv:2006.10593.

Friedman, J., Hastie, T. and Tibshirani, R., 2010. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1), p.1.

Zou, H. and Hastie, T., 2005. Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2), pp.301-320.

Tibshirani, R., 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), pp.267-288.

See Also

glmtrans, predict.glmtrans, models, plot.glmtrans, cv.glmnet, glmnet.

Examples

set.seed(0, kind = "L'Ecuyer-CMRG")

# study the linear model
D.training <- models("gaussian", type = "all", K = 2, p = 500, Ka = 1, n.target = 100, cov.type = 2)
detection.gaussian <- source_detection(D.training$target, D.training$source)
detection.gaussian$transferable.source.id


# study the logistic model
D.training <- models("binomial", type = "all", K = 2, p = 500, Ka = 1, n.target = 100, cov.type = 2)
detection.binomial <- source_detection(D.training$target, D.training$source,
family = "binomial", cores = 2)
detection.binomial$transferable.source.id


# study Poisson model
D.training <- models("poisson", type = "all", K = 2, p = 500, Ka = 1, n.target = 100, cov.type = 2)
detection.poisson <- source_detection(D.training$target, D.training$source,
family = "poisson", cores = 2)
detection.poisson$transferable.source.id


[Package glmtrans version 2.0.0 Index]