fusionbase {FusionLearn} | R Documentation |
Fusion learning method for continuous responses
Description
fusionbase
conducts the group penalization to multiple linear models with a specified penalty value. fusionbase.fit
can be used to search the best candidate model based on the pseudo Bayesian information criterion with a sequence of penalty values.
Usage
fusionbase(x, y, lambda, N, p, m, beta=0.1, thresh=0.05,
maxiter=30, methods="scad",Complete=TRUE)
fusionbase.fit(x, y, lambda, N, p, m, beta=0.1, thresh=0.05,
maxiter=30, methods="scad", Complete=TRUE, depen ="IND", a=1)
Arguments
x |
List. Listing matrices of the predictors from different platforms. |
y |
List. A list of continuous responses vectors from different platforms following the same order as in |
lambda |
Numeric or vector. For |
N |
Numeric or vector. If only one numeric value is provided, equal sample size will be assumed for each data set. If a vector is provided, then the elements are the sample sizes for all the platforms. |
p |
Numeric. The number of predictors. |
m |
Numeric. The number of platforms. |
beta |
Numeric or Matrix. An initial value for the estimated parameters with dimensions nvars x nplatforms. The defaul value is 0.1. |
thresh |
Numeric. The stopping criteria. The default value is 0.05. |
maxiter |
Numeric. Maximum number of iterations. The default value is 30. |
methods |
Character ("lass" or "scad"). |
Complete |
Logic input. If |
depen |
Character. Input only for function |
a |
Numeric. Input only for function |
Details
The basic fusion learning function to learn from multiple linear models with continuous responses. More details regarding the model assumptions and the algorithm can be found in FusionLearn
.
Value
fusionbase
returns a list that has components:
beta |
A matrix (nvars x nplatforms) containing estimated coefficients of each linear model. If some data sets do not have the complete set of predictors, the corresponding coefficients are output as |
method |
Penalty function LASSO or SCAD. |
threshold |
The numeric value shows the difference in the estimates between the successive updates upon convergence. |
iteration |
The numeric value shows the number of iterations upon convergence. |
fusionbase.fit
provides the results in a table:
lambda |
The sequence of penalty values. |
BIC |
The pseudolikelihood Bayesian information criterion evaluated at the sequence of the penalty values. |
-2Loglkh |
Minus twice the pseudo loglikelihood of the chosen model. |
Est_df |
The estimated degrees of freedom quantifying the model complexity. |
fusionbase.fit
also returns a model selection plot showing the results above.
Note
The range of the penalty values should be carefully chosen. For some penalty values, the resulting models may have singular information matrix or the fitting of the glm cannot converge.
Author(s)
Xin Gao, Yuan Zhong, and Raymond J. Carroll
References
Gao, X and Carroll, R. J. (2017) Data integration with high dimensionality. Biometrika, 104, 2, pp. 251-272
Examples
##analysis of the stock index data
#Responses contain indices "VIX","GSPC", and "DJI"
y <- list(stockindexVIX[,1],stockindexGSPC[,1],stockindexDJI[,1])
#Predictors include 46 stocks
x <- list(stockindexVIX[,2:47],stockindexGSPC[,2:47],stockindexDJI[,2:47])
##Implementing the model selection algorithm based on the psuedolikelihood
##information criteria
model <- fusionbase.fit(x,y,seq(0.03,5,length.out = 10),232,46,3,depen="CORR")
lambda <- model[which.min(model[,2]),1]
result <- fusionbase(x,y,lambda,232,46,3)
##Identify the significant predictors for the three indices
id <- which(result$beta[,1]!=0)+1
colnames(stockindexVIX)[id]