R: Define Combinations for Search Process

get.combinations {ldt}

R Documentation

Define Combinations for Search Process

Description

This function defines a structure for a two-level nested loop used in a model search (or screening) process. The outer loop is defined by a vector of sizes and all the combinations of the variables are generated automatically. The inner loop is defined by a list of predefined combinations of the variables. Each variable can belong to either endogenous or exogenous variables based on their usage.

Usage

get.combinations(
  sizes = c(1),
  partitions = NULL,
  numFixPartitions = 0,
  innerGroups = list(c(1)),
  numTargets = 1,
  stepsNumVariables = c(NA),
  stepsFixedNames = NULL,
  stepsSavePre = NULL
)

Arguments

`sizes`	A numeric vector or a list of numeric vectors that determines the sizes of outer loop combinations. For example, if the outer loop belongs to the endogenous variables, `c(1, 2)` means all models with 1 and 2 equations. If the outer loop belongs to exogenous variables, `c(1,2)` means all regressions with 1 and 2 exogenous variables. It can also be a list of numeric vectors for step-wise search. Each vector determines the size of the models in a step. In the next step, a subset of potential variables is selected by using `stepsNumVariables` argument.
`partitions`	A list of numeric vectors or character vectors that partitions the outer loop variables. No model is estimated with two variables from the same partition.
`numFixPartitions`	A single number that determines the number of partitions at the beginning of `partitions` to be included in all models.
`innerGroups`	A list of numeric vectors or character vectors that determines different combinations of the variables for the inner loop. For example, if the inner loop belongs to exogenous data, `list(c(1), c(1, 2))` means estimating all models with just the first exogenous variable and all models with both first and second exogenous variables.
`numTargets`	An integer for the number of target variables at the first columns of the data matrix. Results of a search process are specific to these variables. A model is not estimated if it does not contain a target variable.
`stepsNumVariables`	A numeric vector. If `sizes` is a list (i.e., a step-wise search), this vector must be of equal length and determines the number of variables (with best performance) in each step.
`stepsFixedNames`	A character vector. If `sizes` is a list (i.e., a step-wise search), this vector determines the name of variables to be included in all steps.
`stepsSavePre`	A name for saving and loading progress, if `sizes` is a list. Each step's result is saved in a file (name=`paste0(stepsSavePre,i)`) where `i` is the index of the step.

Details

The get.combinations function in the ldt package uses a two-level nested loop to iterate over different combinations of endogenous and exogenous variables. This is similar to running the following code:

for (endo in list(c(1), c(1, 2)))
  for (exo in list(c(1), c(1, 2)))
    Estimate a model using \code{endo} and \code{exo} indexation

However, predefining both loops is not memory efficient. Therefore, ldt uses a running algorithm to define the outer loop. It asks for the desired size of endogenous or exogenous variables in the model (i.e., sizes) and creates the outer groups using all possible combinations of the variables. The partitions and numFixPartitions parameters can be used to restrict this set.

For the inner loop, you must provide the desired combination of variables (endogenous or exogenous). Given m as the number of variables, you can generate all possible combinations using the following code:

m <- 4
combinations <- unlist(lapply(1:m, function(i) {
 t(combn(1:m, i, simplify = FALSE))
}), recursive = FALSE)

You can use this as the innerGroups argument. However, this might result in a large model set.

Note that in ldt, if the data matrix does not have column names, default names for the endogenous variables are Y1, Y2, ..., and default names for the exogenous variables are X1, X2, .... See get.data() function for more details.

Also note that ldt ensure that all possible models can be estimated with the given number of partitions and sizes. If it's not possible, it will stop with an error message.

Value

A list suitable for use in ldt::search.? functions. The list contains:

`sizes`	The sizes of outer loop combinations.
`partitions`	The partitions of outer loop variables.
`numFixPartitions`	The number of fixed partitions at the beginning.
`innerGroups`	Different combinations of variables for inner loop.
`numTargets`	The number of target variables at first columns.
`stepsNumVariables`	The number of variables in each step for step-wise search.
`stepsFixedNames`	The names of fixed variables in each step for step-wise search.
`stepsSavePre`	The name for saving and loading progress for step-wise search.

Examples

# Some basic examples are given in this section. However, more practical examples are available
# for the \code{search.?} functions.

# Example 1:
combinations1 <- get.combinations(sizes = c(1, 2))
# The function will generate all possible combinations of sizes 1 and 2.

# Example 2: Using partitions
combinations2 <- get.combinations(sizes = c(1, 2), partitions = list(c(1, 2), c(3, 4)))

# Here, we're specifying partitions for the variables.
# The function will generate combinations such that no model is estimated with two variables
# from the same partition.

# Example 3: Specifying inner groups
combinations3 <- get.combinations(sizes = c(1, 2), innerGroups = list(c(1), c(1, 2)))

# In this example, we're specifying different combinations of variables for the inner loop.
# For instance, \code{list(c(1), c(1, 2))} means estimating all models with just the first
# variable and all models with both first and second variables.

# Example 4: Step-wise search
combinations4 <- get.combinations(sizes = list(c(1), c(1, 2)), stepsNumVariables = c(NA, 1))

# This example demonstrates a step-wise search. In the first step (\code{sizes = c(1)}), all
# models with one variable are estimated.
# In the next step (\code{sizes = c(1, 2)}), a subset of potential variables is selected based
# on their performance in the previous step and all models with both first and second variables
# are estimated.

[Package ldt version 0.5.3 Index]