metafuse {metafuse} | R Documentation |
fit a GLM with fusion penalty for data integraion
Description
Fit a GLM with fusion penalty on coefficients within each covariate across datasets, generate solution path and fusograms for visualization of the model selection.
Usage
metafuse(X = X, y = y, sid = sid, fuse.which = c(0:ncol(X)),
family = "gaussian", intercept = TRUE, alpha = 0, criterion = "EBIC",
verbose = TRUE, plots = FALSE, loglambda = TRUE)
Arguments
X |
a matrix (or vector) of predictor(s), with dimensions of |
y |
a vector of response, with length |
sid |
data source ID of length |
fuse.which |
a vector of integers from 0 to |
family |
response vector type, |
intercept |
if |
alpha |
the ratio of sparsity penalty to fusion penalty, default is 0 (i.e., no variable selection, only fusion) |
criterion |
|
verbose |
if |
plots |
if |
loglambda |
if |
Details
Adaptive lasso penalty is used. See Zou (2006) for detail.
Value
A list containing the following items will be returned:
family |
the response/model type |
criterion |
model selection criterion used |
alpha |
the ratio of sparsity penalty to fusion penalty |
if.fuse |
whether covariate is assumed to be heterogeneous (1) or homogeneous (0) |
betahat |
the estimated regression coefficients |
betainfo |
additional information about the fit, including degree of freedom, optimal lambda value, maximum lambda value to fuse all coefficients, and estimated friction of fusion |
References
Lu Tang, and Peter X.K. Song. Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration. Journal of Machine Learning Research, 17(113):1-23, 2016.
Fei Wang, Lu Wang, and Peter X.K. Song. Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements. Biometrics, DOI:10.1111/biom.12496, 2016.
Examples
########### generate data ###########
n <- 200 # sample size in each dataset (can also be a K-element vector)
K <- 10 # number of datasets for data integration
p <- 3 # number of covariates in X (including the intercept)
# the coefficient matrix of dimension K * p, used to specify the heterogeneous pattern
beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # beta_0 of intercept
0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1 of X_1
0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), # beta_2 of X_2
K, p)
# generate a data set, family=c("gaussian", "binomial", "poisson", "cox")
data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123)
# prepare the input for metafuse
y <- data$y
sid <- data$group
X <- data[,-c(1,ncol(data))]
########### run metafuse ###########
# fuse slopes of X1 (which is heterogeneous with 2 clusters)
metafuse(X=X, y=y, sid=sid, fuse.which=c(1), family="gaussian", intercept=TRUE, alpha=0,
criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)
# fuse slopes of X2 (which is heterogeneous with 3 clusters)
metafuse(X=X, y=y, sid=sid, fuse.which=c(2), family="gaussian", intercept=TRUE, alpha=0,
criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)
# fuse all three covariates
metafuse(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE, alpha=0,
criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)
# fuse all three covariates, with sparsity penalty
metafuse(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE, alpha=1,
criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)