lmCombine {greybox} | R Documentation |
Combine regressions based on information criteria
Description
Function combines parameters of linear regressions of the first variable on all the other provided data.
Usage
lmCombine(data, ic = c("AICc", "AIC", "BIC", "BICc"), bruteforce = FALSE,
silent = TRUE, formula = NULL, subset = NULL,
distribution = c("dnorm", "dlaplace", "ds", "dgnorm", "dlogis", "dt",
"dalaplace", "dlnorm", "dllaplace", "dls", "dlgnorm", "dbcnorm", "dinvgauss",
"dgamma", "dexp", "dfnorm", "drectnorm", "dpois", "dnbinom", "dbeta",
"dlogitnorm", "plogis", "pnorm"), parallel = FALSE, ...)
Arguments
data |
Data frame containing dependent variable in the first column and the others in the rest. |
ic |
Information criterion to use. |
bruteforce |
If |
silent |
If |
formula |
If provided, then the selection will be done from the listed variables in the formula after all the necessary transformations. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
distribution |
Distribution to pass to |
parallel |
If |
... |
Other parameters passed to |
Details
The algorithm uses alm() to fit different models and then combines the models based on the selected IC. The parameters are combined so that if they are not present in some of models, it is assumed that they are equal to zero. Thus, there is a shrinkage effect in the combination.
Some details and examples of application are also given in the vignette
"Greybox": vignette("greybox","greybox")
Value
Function returns model
- the final model of the class
"greyboxC". The list of variables:
coefficients - combined parameters of the model,
vcov - combined covariance matrix of the model,
fitted - the fitted values,
residuals - residual of the model,
distribution - distribution used in the estimation,
logLik - combined log-likelihood of the model,
IC - the values of the combined information criterion,
ICType - the type of information criterion used,
df.residual - number of degrees of freedom of the residuals of the combined model,
df - number of degrees of freedom of the combined model,
importance - importance of the parameters,
combination - the table, indicating which variables were used in every model construction and what were the weights for each model,
timeElapsed - the time elapsed for the estimation of the model.
Author(s)
Ivan Svetunkov, ivan@svetunkov.ru
References
Burnham Kenneth P. and Anderson David R. (2002). Model Selection and Multimodel Inference. A Practical Information-Theoretic Approach. Springer-Verlag New York. DOI: [10.1007/b97636](http://dx.doi.org/10.1007/b97636).
McQuarrie, A. D. (1999). A small-sample correction for the Schwarz SIC model selection criterion. Statistics & Probability Letters, 44(1), 79–86. [10.1016/S0167-7152(98)00294-6](https://doi.org/10.1016/S0167-7152(98)00294-6).
See Also
Examples
### Simple example
xreg <- cbind(rnorm(100,10,3),rnorm(100,50,5))
xreg <- cbind(100+0.5*xreg[,1]-0.75*xreg[,2]+rnorm(100,0,3),xreg,rnorm(100,300,10))
colnames(xreg) <- c("y","x1","x2","Noise")
inSample <- xreg[1:80,]
outSample <- xreg[-c(1:80),]
# Combine all the possible models
ourModel <- lmCombine(inSample,bruteforce=TRUE)
predict(ourModel,outSample)
plot(predict(ourModel,outSample))
### Fat regression example
xreg <- matrix(rnorm(5000,10,3),50,100)
xreg <- cbind(100+0.5*xreg[,1]-0.75*xreg[,2]+rnorm(50,0,3),xreg,rnorm(50,300,10))
colnames(xreg) <- c("y",paste0("x",c(1:100)),"Noise")
inSample <- xreg[1:40,]
outSample <- xreg[-c(1:40),]
# Combine only the models close to the optimal
ourModel <- lmCombine(inSample, ic="BICc",bruteforce=FALSE)
summary(ourModel)
plot(predict(ourModel, outSample))
# Combine in parallel - should increase speed in case of big data
## Not run: ourModel <- lmCombine(inSample, ic="BICc", bruteforce=TRUE, parallel=TRUE)
summary(ourModel)
plot(predict(ourModel, outSample))
## End(Not run)