cmss {studyStrap} | R Documentation |
Covariate-Matched Study Strap for Multi-Study Learning: Fits accept/reject algorithm based on covariate similarity measure
Description
Covariate-Matched Study Strap for Multi-Study Learning: Fits accept/reject algorithm based on covariate similarity measure
Usage
cmss(formula = Y ~ ., data, target.study, sim.fn = NA,
converge.lim = 50000, bag.size = length(unique(data$Study)),
max.straps = 150, paths = 5, stack = "standard", sim.covs = NA,
ssl.method = list("lm"), ssl.tuneGrid = list(c()), sim.mets = TRUE,
model = FALSE, meanSampling = FALSE, customFNs = list(),
stack.standardize = FALSE)
Arguments
formula |
Model formula |
data |
A dataframe with all the studies has the following columns in this order: "Study", "Y", "V1", ...., "Vp" |
target.study |
Dataframe of the design matrix (just covariates) of study one aims to make predictions on |
sim.fn |
Optional function to be used as similarity measure for accept/reject step. Default function is: |cor( barx^(r)|,~ barx_target ) | |
converge.lim |
Integer indicating the number of consecutive rejected study straps to reach convergence criteria. |
bag.size |
Integer indicating the bag size tuning parameter. |
max.straps |
Integer indicating the maximum number of accepted straps that can be fit across all paths before the algorithm stops accepting new study straps. |
paths |
Integer indicating the number of paths (an accept/reject path is all of the models accepted before reaching one convergence criteria). |
stack |
String determining whether stacking matrix made on training studies "standard" or on the accepted study straps "ss." Default: "standard." |
sim.covs |
Is a vector of names of covariates or the column numbers of the covariates to be used for the similarity measure. Default is to use all covariates. |
ssl.method |
A list of strings indicating which modeling methods to use. |
ssl.tuneGrid |
A list of the tuning parameters in the format of the caret package. Each element must be a dataframe (as required by caret). If no tuning parameters are required then NA is indicated. |
sim.mets |
Boolean indicating whether to calculate default covariate profile similarity measures. |
model |
Indicates whether to attach training data to model object. |
meanSampling |
= FALSE Boolean determining whether to use mean covariates for similarity measure. This can be much quicker. |
customFNs |
Optional list of functions that can be used to add custom covaraite profile similarity measures. |
stack.standardize |
Boolean determining whether stacking weights are standardized to sum to 1. Default is FALSE |
Value
A model object of studyStrap class "ss" that can be used to make predictions.
Examples
##########################
##### Simulate Data ######
##########################
set.seed(1)
# create half of training dataset from 1 distribution
X1 <- matrix(rnorm(2000), ncol = 2) # design matrix - 2 covariates
B1 <- c(5, 10, 15) # true beta coefficients
y1 <- cbind(1, X1) %*% B1
# create 2nd half of training dataset from another distribution
X2 <- matrix(rnorm(2000, 1,2), ncol = 2) # design matrix - 2 covariates
B2 <- c(10, 5, 0) # true beta coefficients
y2 <- cbind(1, X2) %*% B2
X <- rbind(X1, X2)
y <- c(y1, y2)
study <- sample.int(10, 2000, replace = TRUE) # 10 studies
data <- data.frame( Study = study, Y = y, V1 = X[,1], V2 = X[,2] )
# create target study design matrix for covariate profile similarity weighting and
# accept/reject algorithm (covaraite-matched study strap)
target <- matrix(rnorm(1000, 3, 5), ncol = 2) # design matrix
colnames(target) <- c("V1", "V2")
##########################
##### Model Fitting #####
##########################
# Fit model with 1 Single-Study Learner (SSL): PCA Regression
arMod1 <- cmss(formula = Y ~.,
data = data,
target.study = target,
converge.lim = 10,
bag.size = length(unique(data$Study)),
max.straps = 50,
paths = 2,
ssl.method = list("pcr"),
ssl.tuneGrid = list(data.frame("ncomp" = 2))
)
# Fit model with 2 SSLs: Linear Regression and PCA Regression
arMod2 <- cmss(formula = Y ~.,
data = data,
target.study = target,
converge.lim = 20,
bag.size = length(unique(data$Study)),
max.straps = 50,
paths = 2,
ssl.method = list("lm", "pcr"),
ssl.tuneGrid = list(NA, data.frame("ncomp" = 2))
)
# Fit model with custom similarity function for
# accept/reject step and 2 custom function for Covariate
# Profile Similarity weights
# custom function for CPS
fn1 <- function(x1,x2){
return( abs( cor( colMeans(x1), colMeans(x2) )) )
}
fn2 <- function(x1,x2){
return( sum ( ( colMeans(x1) - colMeans(x2) )^2 ) )
}
arMod3 <- cmss(formula = Y ~.,
data = data,
target.study = target,
sim.fn = fn1,
customFNs = list(fn1, fn2),
converge.lim = 50,
bag.size = length(unique(data$Study)),
max.straps = 50,
paths = 2,
ssl.method = list("lm", "pcr"),
ssl.tuneGrid = list(NA, data.frame("ncomp" = 2))
)
#########################
##### Predictions ######
#########################
preds <- studyStrap.predict(arMod1, target)