R: fit a GLM with fusion penalty for data integraion, with a...

metafuse.l {metafuse}

R Documentation

fit a GLM with fusion penalty for data integraion, with a fixed lambda

Description

Fit a GLM with fusion penalty on coefficients within each covariate at given lambda.

Usage

metafuse.l(X = X, y = y, sid = sid, fuse.which = c(0:ncol(X)),
  family = "gaussian", intercept = TRUE, alpha = 0, lambda = lambda)

Arguments

`X`	a matrix (or vector) of predictor(s), with dimensions of `N*(p-1)`, where `N` is the total sample size of the integrated dataset
`y`	a vector of response, with length `N`; when `family="cox"`, `y` is a data frame with cloumns `time` and `status`
`sid`	data source ID of length `N`, must contain integers numbered from 1 to `K`
`fuse.which`	a vector of integers from 0 to `p-1`, indicating which covariates are considered for fusion; 0 corresponds to the intercept; coefficients of covariates not in this vector are homogeneously estimated across all datasets
`family`	response vector type, `"gaussian"` if `y` is a continuous vector, `"binomial"` if `y` is binary vector, `"poisson"` if `y` is a count vector, `"cox"` if `y` is a data frame with cloumns `time` and `status`
`intercept`	if `TRUE`, intercept will be included, default is `TRUE`
`alpha`	the ratio of sparsity penalty to fusion penalty, default is 0 (i.e., no variable selection, only fusion)
`lambda`	tuning parameter for fusion penalty

Details

Adaptive lasso penalty is used. See Zou (2006) for detail.

Value

A list containing the following items will be returned:

`family`	the response/model type
`alpha`	the ratio of sparsity penalty to fusion penalty
`if.fuse`	whether covariate is assumed to be heterogeneous (1) or homogeneous (0)
`betahat`	the estimated regression coefficients
`betainfo`	additional information about the fit, including degree of freedom, optimal lambda value, maximum lambda value to fuse all coefficients, and estimated friction of fusion

References

Lu Tang, and Peter X.K. Song. Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration. Journal of Machine Learning Research, 17(113):1-23, 2016.

Fei Wang, Lu Wang, and Peter X.K. Song. Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements. Biometrics, DOI:10.1111/biom.12496, 2016.

Examples

########### generate data ###########
n <- 200    # sample size in each dataset (can also be a K-element vector)
K <- 10     # number of datasets for data integration
p <- 3      # number of covariates in X (including the intercept)

# the coefficient matrix of dimension K * p, used to specify the heterogeneous pattern
beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,   # beta_0 of intercept
                  0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,   # beta_1 of X_1
                  0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0),  # beta_2 of X_2
                K, p)

# generate a data set, family=c("gaussian", "binomial", "poisson", "cox")
data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123)

# prepare the input for metafuse
y       <- data$y
sid     <- data$group
X       <- data[,-c(1,ncol(data))]

########### run metafuse ###########
# fit metafuse at a given lambda
metafuse.l(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE,
          alpha=1, lambda=0.5)

[Package metafuse version 2.0-1 Index]