smtl {sMTL} | R Documentation |
smtl: make model-fitting function
Description
smtl: make model-fitting function
Usage
smtl(
y,
X,
study = NA,
s,
commonSupp = FALSE,
warmStart = TRUE,
lambda_1 = 0,
lambda_2 = 0,
lambda_z = 0,
scale = TRUE,
maxIter = 10000,
LocSrch_maxIter = 50,
messageInd = TRUE,
model = TRUE,
independent.regs = FALSE
)
Arguments
y |
A numeric outcome vector (for multi-task/domain generalization problems) or a numeric outcome matrix (for multi-label problems) |
X |
A matrix of covariates |
study |
A vector of integers specifying task (or study/domain) ID. This should be set to NA for Multi-Label problems, but is required for Multi-Task and Domain Generalization problems. |
s |
An integer specifying the sparsity level |
commonSupp |
A boolean specifying whether to constrain solutions to have a common support |
warmStart |
A boolean specifying whether a warm start model is fit internally before the final model. Warm starts improve solution quality but will be slower. |
lambda_1 |
A numeric vector of ridge penalty hyperparameter values |
lambda_2 |
A numeric vector of betaBar (to borrow strength across coefficient values) penalty hperparameter values |
lambda_z |
A numeric vector zBar (to borrow strength across coefficient supports) penalty hperparameter values |
scale |
A boolean specifying whether to center and scale covariates before model fitting (either way coefficient estimates are returned on original scale before centering/scaling) |
maxIter |
An integer specifying the maximum number of coordinate descent iterations before |
LocSrch_maxIter |
An integer specifying the number of maximum local search iterations |
messageInd |
A boolean specifying whether to include messages (verbose) |
model |
A boolean indicating whether to return design matrix and outcome vector |
independent.regs |
A boolean specifying whether to fit independent regressions (instead of multi-task). This ensures there is NO information sharing via active sets or penalties |
Value
A list (object of S3 class).
beta |
Matrix with coefficient estimates where column j are estimates from task j. |
reg_type |
String specifying whether model is |
K |
Integer that indicates number of tasks. |
s |
An integer that indicates sparsity level. |
commonSupp |
Boolean indicating of supports are common across tasks. |
warmStart |
A Boolean indicating whether to fit a MTL model as a warm start. |
grid |
A dataframe including grid of hyperparameters that model is fit on. |
maxIter |
An integer specifying the maximum number of iterations of block CD. |
LocSrch_maxIter |
An integer specify the maximum number of iterations of local search. |
independent.regs |
A boolean indicating whether to make each task independent of each other (no shared active sets). |
AS_multiplier |
An integer specifying the active set multiplier. |
X_train |
A Matrix: the design matrix (row concatenated across tasks). |
y_train |
The outcome vector or matrix. |
Examples
## Not run:
if (identical(Sys.getenv("AUTO_JULIA_INSTALL"), "true")) { ## The examples are quite time consuming
## Do initiation for and automatic installation if necessary
# load package
library(sMTL)
smtl_setup()
#####################################################################################
##### simulate data
#####################################################################################
set.seed(1) # fix the seed to get a reproducible result
K <- 4 # number of datasets
p <- 100 # covariate dimension
s <- 5 # support size
q <- 7 # size of subset of covariates that can be non-zero for any task
n_k <- 50 # task sample size
N <- n_k * p # full dataset samplesize
X <- matrix( rnorm(N * p), nrow = N, ncol=p) # full design matrix
B <- matrix(1 + rnorm(K * (p+1) ), nrow = p + 1, ncol = K) # betas before making sparse
Z <- matrix(0, nrow = p, ncol = K) # matrix of supports
y <- vector(length = N) # outcome vector
# randomly sample support to make betas sparse
for(j in 1:K) Z[1:q, j] <- sample( c( rep(1,s), rep(0, q - s) ), q, replace = FALSE )
B[-1,] <- B[-1,] * Z # make betas sparse and ensure all models have an intercept
task <- rep(1:K, each = n_k) # vector of task labels (indices)
# iterate through and make each task specific dataset
for(j in 1:K){
indx <- which(task == j) # indices of task
e <- rnorm(n_k)
y[indx] <- B[1, j] + X[indx,] %*% B[-1,j] + e
}
colnames(B) <- paste0("beta_", 1:K)
rownames(B) <- paste0("X_", 1:(p+1))
print("Betas")
print(round(B[1:8,],2))
#####################################################################################
##### fit Multi-Task Learning Model for Heterogeneous Support
#####################################################################################
mod <- sMTL::smtl(y = y,
X = X,
study = task,
s = 5,
commonSupp = FALSE,
lambda_1 = 0.001,
lambda_2 = 0,
lambda_z = 0.25)
print(round(mod$beta[1:8,],2))
# make predictions
preds <- sMTL::predict(model = mod, X = X[1:5,])
#####################################################################################
##### fit Multi-Task Learning Model for Common Support
#####################################################################################
library(sMTL)
sMTL::smtl_setup(path = "/Applications/Julia-1.5.app/Contents/Resources/julia/bin")
mod <- sMTL::smtl(y = y,
X = X,
study = task,
s = 5,
commonSupp = TRUE,
lambda_1 = 0.001,
lambda_2 = 0.5)
print(round(mod$beta[1:8,],2))
}
## End(Not run)