cmss {studyStrap}R Documentation

Covariate-Matched Study Strap for Multi-Study Learning: Fits accept/reject algorithm based on covariate similarity measure


Covariate-Matched Study Strap for Multi-Study Learning: Fits accept/reject algorithm based on covariate similarity measure


cmss(formula = Y ~ ., data,, sim.fn = NA,
  converge.lim = 50000, bag.size = length(unique(data$Study)),
  max.straps = 150, paths = 5, stack = "standard", sim.covs = NA,
  ssl.method = list("lm"), ssl.tuneGrid = list(c()), sim.mets = TRUE,
  model = FALSE, meanSampling = FALSE, customFNs = list(),
  stack.standardize = FALSE)



Model formula


A dataframe with all the studies has the following columns in this order: "Study", "Y", "V1", ...., "Vp"

Dataframe of the design matrix (just covariates) of study one aims to make predictions on


Optional function to be used as similarity measure for accept/reject step. Default function is: |cor( barx^(r)|,~ barx_target ) |


Integer indicating the number of consecutive rejected study straps to reach convergence criteria.


Integer indicating the bag size tuning parameter.


Integer indicating the maximum number of accepted straps that can be fit across all paths before the algorithm stops accepting new study straps.


Integer indicating the number of paths (an accept/reject path is all of the models accepted before reaching one convergence criteria).


String determining whether stacking matrix made on training studies "standard" or on the accepted study straps "ss." Default: "standard."


Is a vector of names of covariates or the column numbers of the covariates to be used for the similarity measure. Default is to use all covariates.


A list of strings indicating which modeling methods to use.


A list of the tuning parameters in the format of the caret package. Each element must be a dataframe (as required by caret). If no tuning parameters are required then NA is indicated.


Boolean indicating whether to calculate default covariate profile similarity measures.


Indicates whether to attach training data to model object.


= FALSE Boolean determining whether to use mean covariates for similarity measure. This can be much quicker.


Optional list of functions that can be used to add custom covaraite profile similarity measures.


Boolean determining whether stacking weights are standardized to sum to 1. Default is FALSE


A model object of studyStrap class "ss" that can be used to make predictions.


##### Simulate Data ######

# create half of training dataset from 1 distribution
X1 <- matrix(rnorm(2000), ncol = 2) # design matrix - 2 covariates
B1 <- c(5, 10, 15) # true beta coefficients
y1 <- cbind(1, X1) %*% B1

# create 2nd half of training dataset from another distribution
X2 <- matrix(rnorm(2000, 1,2), ncol = 2) # design matrix - 2 covariates
B2 <- c(10, 5, 0) # true beta coefficients
y2 <- cbind(1, X2) %*% B2

X <- rbind(X1, X2)
y <- c(y1, y2)

study <-, 2000, replace = TRUE) # 10 studies
data <- data.frame( Study = study, Y = y, V1 = X[,1], V2 = X[,2] )

# create target study design matrix for covariate profile similarity weighting and
# accept/reject algorithm (covaraite-matched study strap)
target <- matrix(rnorm(1000, 3, 5), ncol = 2) # design matrix
colnames(target) <- c("V1", "V2")

##### Model Fitting #####

# Fit model with 1 Single-Study Learner (SSL): PCA Regression
arMod1 <-  cmss(formula = Y ~.,
               data = data,
      = target,
               converge.lim = 10,
               bag.size = length(unique(data$Study)),
               max.straps = 50,
               paths = 2,
               ssl.method = list("pcr"),
               ssl.tuneGrid = list(data.frame("ncomp" = 2))

# Fit model with 2 SSLs: Linear Regression and PCA Regression
arMod2 <-  cmss(formula = Y ~.,
               data = data,
      = target,
               converge.lim = 20,
               bag.size = length(unique(data$Study)),
               max.straps = 50,
               paths = 2,
               ssl.method = list("lm", "pcr"),
               ssl.tuneGrid = list(NA, data.frame("ncomp" = 2))

# Fit model with custom similarity function for
# accept/reject step and 2 custom function for Covariate
# Profile Similarity weights

# custom function for CPS

fn1 <- function(x1,x2){
return( abs( cor( colMeans(x1), colMeans(x2) )) )

fn2 <- function(x1,x2){
return( sum ( ( colMeans(x1) - colMeans(x2) )^2 ) )

arMod3 <-  cmss(formula = Y ~.,
               data = data,
      = target,
               sim.fn = fn1,
               customFNs = list(fn1, fn2),
               converge.lim = 50,
               bag.size = length(unique(data$Study)),
               max.straps = 50,
               paths = 2,
               ssl.method = list("lm", "pcr"),
               ssl.tuneGrid = list(NA, data.frame("ncomp" = 2))

#####  Predictions ######

preds <- studyStrap.predict(arMod1, target)

[Package studyStrap version 1.0.0 Index]