smtl {sMTL}R Documentation

smtl: make model-fitting function

Description

smtl: make model-fitting function

Usage

smtl(
  y,
  X,
  study = NA,
  s,
  commonSupp = FALSE,
  warmStart = TRUE,
  lambda_1 = 0,
  lambda_2 = 0,
  lambda_z = 0,
  scale = TRUE,
  maxIter = 10000,
  LocSrch_maxIter = 50,
  messageInd = TRUE,
  model = TRUE,
  independent.regs = FALSE
)

Arguments

y

A numeric outcome vector (for multi-task/domain generalization problems) or a numeric outcome matrix (for multi-label problems)

X

A matrix of covariates

study

A vector of integers specifying task (or study/domain) ID. This should be set to NA for Multi-Label problems, but is required for Multi-Task and Domain Generalization problems.

s

An integer specifying the sparsity level

commonSupp

A boolean specifying whether to constrain solutions to have a common support

warmStart

A boolean specifying whether a warm start model is fit internally before the final model. Warm starts improve solution quality but will be slower.

lambda_1

A numeric vector of ridge penalty hyperparameter values

lambda_2

A numeric vector of betaBar (to borrow strength across coefficient values) penalty hperparameter values

lambda_z

A numeric vector zBar (to borrow strength across coefficient supports) penalty hperparameter values

scale

A boolean specifying whether to center and scale covariates before model fitting (either way coefficient estimates are returned on original scale before centering/scaling)

maxIter

An integer specifying the maximum number of coordinate descent iterations before

LocSrch_maxIter

An integer specifying the number of maximum local search iterations

messageInd

A boolean specifying whether to include messages (verbose)

model

A boolean indicating whether to return design matrix and outcome vector

independent.regs

A boolean specifying whether to fit independent regressions (instead of multi-task). This ensures there is NO information sharing via active sets or penalties

Value

A list (object of S3 class).

beta

Matrix with coefficient estimates where column j are estimates from task j.

reg_type

String specifying whether model is "multiStudy" denoting that there is a separate design matrix for each task, "multiLabel" where the design matrix is the same across tasks and "L0" indicating a single-task regression.

K

Integer that indicates number of tasks.

s

An integer that indicates sparsity level.

commonSupp

Boolean indicating of supports are common across tasks.

warmStart

A Boolean indicating whether to fit a MTL model as a warm start.

grid

A dataframe including grid of hyperparameters that model is fit on.

maxIter

An integer specifying the maximum number of iterations of block CD.

LocSrch_maxIter

An integer specify the maximum number of iterations of local search.

independent.regs

A boolean indicating whether to make each task independent of each other (no shared active sets).

AS_multiplier

An integer specifying the active set multiplier.

X_train

A Matrix: the design matrix (row concatenated across tasks).

y_train

The outcome vector or matrix.

Examples


## Not run: 

if (identical(Sys.getenv("AUTO_JULIA_INSTALL"), "true")) { ## The examples are quite time consuming
## Do initiation for and automatic installation if necessary

# load package
library(sMTL)
smtl_setup()

#####################################################################################
##### simulate data
#####################################################################################
set.seed(1) # fix the seed to get a reproducible result
K <- 4 # number of datasets 
p <- 100 # covariate dimension
s <- 5 # support size
q <- 7 # size of subset of covariates that can be non-zero for any task
n_k <- 50 # task sample size
N <- n_k * p # full dataset samplesize
X <- matrix( rnorm(N * p), nrow = N, ncol=p) # full design matrix
B <- matrix(1 + rnorm(K * (p+1) ), nrow = p + 1, ncol = K) # betas before making sparse
Z <- matrix(0, nrow = p, ncol = K) # matrix of supports
y <- vector(length = N) # outcome vector

# randomly sample support to make betas sparse
for(j in 1:K)     Z[1:q, j] <- sample( c( rep(1,s), rep(0, q - s) ), q, replace = FALSE )
B[-1,] <- B[-1,] * Z # make betas sparse and ensure all models have an intercept

task <- rep(1:K, each = n_k) # vector of task labels (indices)

# iterate through and make each task specific dataset
for(j in 1:K){
    indx <- which(task == j) # indices of task
    e <- rnorm(n_k)
    y[indx] <- B[1, j] + X[indx,] %*% B[-1,j] + e
    }
    colnames(B) <- paste0("beta_", 1:K)
    rownames(B) <- paste0("X_", 1:(p+1))
    
    print("Betas")
    print(round(B[1:8,],2))
    
#####################################################################################
##### fit Multi-Task Learning Model for Heterogeneous Support
#####################################################################################
  
    mod <- sMTL::smtl(y = y, 
                      X = X, 
                      study = task, 
                      s = 5, 
                      commonSupp = FALSE,
                      lambda_1 = 0.001,
                      lambda_2 = 0,
                      lambda_z = 0.25)
    
    print(round(mod$beta[1:8,],2))
    
    # make predictions
    preds <- sMTL::predict(model = mod, X = X[1:5,])
    
#####################################################################################
##### fit Multi-Task Learning Model for Common Support
#####################################################################################
    library(sMTL)
    sMTL::smtl_setup(path = "/Applications/Julia-1.5.app/Contents/Resources/julia/bin")
    mod <- sMTL::smtl(y = y, 
                      X = X, 
                      study = task, 
                      s = 5, 
                      commonSupp = TRUE,
                      lambda_1 = 0.001,
                      lambda_2 = 0.5)
    
    print(round(mod$beta[1:8,],2))
    }
    
## End(Not run)
    

[Package sMTL version 0.1.0 Index]