ctr_model {sentometrics} | R Documentation |
Set up control for sentiment-based sparse regression modeling
Description
Sets up control object for linear or nonlinear modeling of a response variable onto a large panel of
textual sentiment measures (and potentially other variables). See sento_model
for details on the
estimation and calibration procedure.
Usage
ctr_model(
model = c("gaussian", "binomial", "multinomial"),
type = c("BIC", "AIC", "Cp", "cv"),
do.intercept = TRUE,
do.iter = FALSE,
h = 0,
oos = 0,
do.difference = FALSE,
alphas = seq(0, 1, by = 0.2),
lambdas = NULL,
nSample = NULL,
trainWindow = NULL,
testWindow = NULL,
start = 1,
do.shrinkage.x = FALSE,
do.progress = TRUE,
nCore = 1
)
Arguments
model |
a character vector with one of the following: "gaussian" (linear regression), "binomial"
(binomial logistic regression), or "multinomial" (multinomial logistic regression).
|
type |
a character vector indicating which model calibration approach to use. Supports "BIC ",
"AIC " and "Cp " (Mallows's Cp) as sparse regression adapted information criteria (Tibshirani and Taylor,
2012; Zou, Hastie and Tibshirani, 2007), and "cv " (cross-validation based on the train
function from the caret package). The adapted information criteria are only available for a linear regression.
|
do.intercept |
a logical , TRUE by default fits an intercept.
|
do.iter |
a logical , TRUE induces an iterative estimation of models at the given nSample size and
performs the associated out-of-sample prediction exercise through time.
|
h |
an integer value that shifts the time series to have the desired prediction setup; h = 0 means
no change to the input data (nowcasting assuming data is aligned properly), h > 0 shifts the dependent variable by
h periods (i.e., rows) further in time (forecasting), h < 0 shifts the independent variables by h
periods.
|
oos |
a non-negative integer to indicate the number of periods to skip from the end of the training sample
up to the out-of-sample prediction(s). This is either used in the cross-validation based calibration approach
(if type = "cv "), or for the iterative out-of-sample prediction analysis (if do.iter = TRUE ). For
instance, given t , the (first) out-of-sample prediction is computed at t + oos + 1 .
|
do.difference |
a logical , TRUE will difference the target variable y supplied in the
sento_model function with as lag the absolute value of the h argument, but
abs(h) > 0 is required. For example, if h = 2 , and assuming the y variable is properly aligned
date-wise with the explanatory variables denoted by X (the sentiment measures and other in x ), the regression
will be of y_{t + 2} - y_t on X_t . If h = -2 , the regression fitted is y_{t + 2} - y_t on
X_{t+2} . The argument is always kept at FALSE if the model argument is one of
c("binomial", "multinomial") .
|
alphas |
a numeric vector of the alphas to test for during calibration, between 0 and 1. A value of
0 pertains to Ridge regression, a value of 1 to LASSO regression; values in between are pure elastic net.
|
lambdas |
a numeric vector of the lambdas to test for during calibration, >= 0 .
A value of zero means no regularization, thus requires care when the data is fat. By default set to
NULL , such that the lambdas sequence is generated by the glmnet function
or set to 10^seq(2, -2, length.out = 100) in case of cross-validation.
|
nSample |
a positive integer as the size of the sample for model estimation at every iteration (ignored if
do.iter = FALSE ).
|
trainWindow |
a positive integer as the size of the training sample for cross-validation (ignored if
type != "cv ").
|
testWindow |
a positive integer as the size of the test sample for cross-validation (ignored if type !=
"cv ").
|
start |
a positive integer to indicate at which point the iteration has to start (ignored if
do.iter = FALSE ). For example, given 100 possible iterations, start = 70 leads to model estimations
only for the last 31 samples.
|
do.shrinkage.x |
a logical vector to indicate which of the other regressors provided through the x
argument of the sento_model function should be subject to shrinkage (TRUE ). If argument is of
length one, it applies to all external regressors.
|
do.progress |
a logical , if TRUE progress statements are displayed during model calibration.
|
nCore |
a positive integer to indicate the number of cores to use for a parallel iterative model
estimation (do.iter = TRUE ). We use the %dopar% construct from the foreach package. By default,
nCore = 1 , which implies no parallelization. No progress statements are displayed whatsoever when nCore > 1 .
For cross-validation models, parallelization can also be carried out for a single-shot model (do.iter = FALSE ),
whenever a parallel backend is set up. See the examples in sento_model .
|
Value
A list
encapsulating the control parameters.
Author(s)
Samuel Borms, Keven Bluteau
References
Tibshirani and Taylor (2012). Degrees of freedom in LASSO problems.
The Annals of Statistics 40, 1198-1232, doi: 10.1214/12-AOS1003.
Zou, Hastie and Tibshirani (2007). On the degrees of freedom of the LASSO.
The Annals of Statistics 35, 2173-2192, doi: 10.1214/009053607000000127.
See Also
sento_model
Examples
# information criterion based model control functions
ctrIC1 <- ctr_model(model = "gaussian", type = "BIC", do.iter = FALSE, h = 0,
alphas = seq(0, 1, by = 0.10))
ctrIC2 <- ctr_model(model = "gaussian", type = "AIC", do.iter = TRUE, h = 4, nSample = 100,
do.difference = TRUE, oos = 3)
# cross-validation based model control functions
ctrCV1 <- ctr_model(model = "gaussian", type = "cv", do.iter = FALSE, h = 0,
trainWindow = 250, testWindow = 4, oos = 0, do.progress = TRUE)
ctrCV2 <- ctr_model(model = "binomial", type = "cv", h = 0, trainWindow = 250,
testWindow = 4, oos = 0, do.progress = TRUE)
ctrCV3 <- ctr_model(model = "multinomial", type = "cv", h = 2, trainWindow = 250,
testWindow = 4, oos = 2, do.progress = TRUE)
ctrCV4 <- ctr_model(model = "gaussian", type = "cv", do.iter = TRUE, h = 0, trainWindow = 45,
testWindow = 4, oos = 0, nSample = 70, do.progress = TRUE)
[Package
sentometrics version 1.0.0
Index]