R: smtl: make model-fitting function

smtl {sMTL}

R Documentation

smtl: make model-fitting function

Description

smtl: make model-fitting function

Usage

smtl(
  y,
  X,
  study = NA,
  s,
  commonSupp = FALSE,
  warmStart = TRUE,
  lambda_1 = 0,
  lambda_2 = 0,
  lambda_z = 0,
  scale = TRUE,
  maxIter = 10000,
  LocSrch_maxIter = 50,
  messageInd = TRUE,
  model = TRUE,
  independent.regs = FALSE
)

Arguments

`y`	A numeric outcome vector (for multi-task/domain generalization problems) or a numeric outcome matrix (for multi-label problems)
`X`	A matrix of covariates
`study`	A vector of integers specifying task (or study/domain) ID. This should be set to NA for Multi-Label problems, but is required for Multi-Task and Domain Generalization problems.
`s`	An integer specifying the sparsity level
`commonSupp`	A boolean specifying whether to constrain solutions to have a common support
`warmStart`	A boolean specifying whether a warm start model is fit internally before the final model. Warm starts improve solution quality but will be slower.
`lambda_1`	A numeric vector of ridge penalty hyperparameter values
`lambda_2`	A numeric vector of betaBar (to borrow strength across coefficient values) penalty hperparameter values
`lambda_z`	A numeric vector zBar (to borrow strength across coefficient supports) penalty hperparameter values
`scale`	A boolean specifying whether to center and scale covariates before model fitting (either way coefficient estimates are returned on original scale before centering/scaling)
`maxIter`	An integer specifying the maximum number of coordinate descent iterations before
`LocSrch_maxIter`	An integer specifying the number of maximum local search iterations
`messageInd`	A boolean specifying whether to include messages (verbose)
`model`	A boolean indicating whether to return design matrix and outcome vector
`independent.regs`	A boolean specifying whether to fit independent regressions (instead of multi-task). This ensures there is NO information sharing via active sets or penalties

Value

A list (object of S3 class).

`beta`	Matrix with coefficient estimates where column j are estimates from task j.
`reg_type`	String specifying whether model is `"multiStudy"` denoting that there is a separate design matrix for each task, `"multiLabel"` where the design matrix is the same across tasks and `"L0"` indicating a single-task regression.
`K`	Integer that indicates number of tasks.
`s`	An integer that indicates sparsity level.
`commonSupp`	Boolean indicating of supports are common across tasks.
`warmStart`	A Boolean indicating whether to fit a MTL model as a warm start.
`grid`	A dataframe including grid of hyperparameters that model is fit on.
`maxIter`	An integer specifying the maximum number of iterations of block CD.
`LocSrch_maxIter`	An integer specify the maximum number of iterations of local search.
`independent.regs`	A boolean indicating whether to make each task independent of each other (no shared active sets).
`AS_multiplier`	An integer specifying the active set multiplier.
`X_train`	A Matrix: the design matrix (row concatenated across tasks).
`y_train`	The outcome vector or matrix.

Examples


## Not run: 

if (identical(Sys.getenv("AUTO_JULIA_INSTALL"), "true")) { ## The examples are quite time consuming
## Do initiation for and automatic installation if necessary

# load package
library(sMTL)
smtl_setup()

#####################################################################################
##### simulate data
#####################################################################################
set.seed(1) # fix the seed to get a reproducible result
K <- 4 # number of datasets 
p <- 100 # covariate dimension
s <- 5 # support size
q <- 7 # size of subset of covariates that can be non-zero for any task
n_k <- 50 # task sample size
N <- n_k * p # full dataset samplesize
X <- matrix( rnorm(N * p), nrow = N, ncol=p) # full design matrix
B <- matrix(1 + rnorm(K * (p+1) ), nrow = p + 1, ncol = K) # betas before making sparse
Z <- matrix(0, nrow = p, ncol = K) # matrix of supports
y <- vector(length = N) # outcome vector

# randomly sample support to make betas sparse
for(j in 1:K)     Z[1:q, j] <- sample( c( rep(1,s), rep(0, q - s) ), q, replace = FALSE )
B[-1,] <- B[-1,] * Z # make betas sparse and ensure all models have an intercept

task <- rep(1:K, each = n_k) # vector of task labels (indices)

# iterate through and make each task specific dataset
for(j in 1:K){
    indx <- which(task == j) # indices of task
    e <- rnorm(n_k)
    y[indx] <- B[1, j] + X[indx,] %*% B[-1,j] + e
    }
    colnames(B) <- paste0("beta_", 1:K)
    rownames(B) <- paste0("X_", 1:(p+1))
    
    print("Betas")
    print(round(B[1:8,],2))
    
#####################################################################################
##### fit Multi-Task Learning Model for Heterogeneous Support
#####################################################################################
  
    mod <- sMTL::smtl(y = y, 
                      X = X, 
                      study = task, 
                      s = 5, 
                      commonSupp = FALSE,
                      lambda_1 = 0.001,
                      lambda_2 = 0,
                      lambda_z = 0.25)
    
    print(round(mod$beta[1:8,],2))
    
    # make predictions
    preds <- sMTL::predict(model = mod, X = X[1:5,])
    
#####################################################################################
##### fit Multi-Task Learning Model for Common Support
#####################################################################################
    library(sMTL)
    sMTL::smtl_setup(path = "/Applications/Julia-1.5.app/Contents/Resources/julia/bin")
    mod <- sMTL::smtl(y = y, 
                      X = X, 
                      study = task, 
                      s = 5, 
                      commonSupp = TRUE,
                      lambda_1 = 0.001,
                      lambda_2 = 0.5)
    
    print(round(mod$beta[1:8,],2))
    }
    
## End(Not run)

[Package sMTL version 0.1.0 Index]