get.data {ldt}R Documentation

Transform and Prepare Data for Analysis

Description

This function prepares a data matrix for analysis. It applies a Box-Cox transformation to the endogenous variables, adds an intercept column, and optionally includes new rows with exogenous data.

Usage

get.data(
  data,
  endogenous = 1,
  equations = NULL,
  weights = NULL,
  lambdas = NULL,
  newData = NULL,
  addIntercept = TRUE,
  ...
)

Arguments

data

A data.frame or a numeric matrix that serves as the primary data source.

endogenous

A single number indicating the number of endogenous variables in the first columns, or a list of names specifying the endogenous variables. The remaining variables will be treated as exogenous.

equations

A formula or a list of formula objects that represent the equations to be used instead of endogenous. If provided, the final data will be a matrix where the response variables are in the first columns and the predictor variables are in the subsequent columns.

weights

A numeric vector or a column matrix representing weights of observations. Not all applications implement this parameter.

lambdas

A numeric vector, a single number, NA, or NULL indicating the lambda parameter(s) for the Box-Cox transformation. Use NULL for no transformation, NA for estimating the lambda parameter for each variable, a single number for an equal lambda parameter for all variables, and a numeric vector for distinct lambda parameters for corresponding variables.

newData

A data.frame or a numeric matrix representing new data for exogenous variables. It should have a structure similar to data, excluding endogenous or response variables.

addIntercept

A logical value indicating whether to add an intercept column to the final matrix.

...

Additional parameters for the MASS::boxcox function.

Details

This function is designed to prepare a data matrix for model search (or screening) analysis. It performs several operations to transform and structure the data appropriately.

The function first checks if the input data is a matrix or a data frame. If new data is provided, it also checks its type. It then extracts the frequency of the first observation from the ldtf attribute of the data, if available.

If no equations are provided, the function assumes that the endogenous variables are in the first columns of the data. It checks if an intercept is already present and throws an error if one is found and addIntercept is set to TRUE. It then validates the number of endogenous variables and converts the data to a numeric matrix.

If column names are missing, they are added based on the number of endogenous and exogenous variables. If new data is provided, it checks its structure and matches it with the exogenous part of the original data.

If equations are provided, they are used to transform the original data into a matrix where response variables are in the first columns and predictor variables in subsequent columns. The new data is also transformed accordingly.

The function then applies a Box-Cox transformation to the endogenous variables if lambda parameters are provided. Weights are added if provided, and an intercept column is added if addIntercept is set to TRUE.

Finally, the function returns a list containing all relevant information for further analysis. This includes the final data matrix, number of endogenous and exogenous variables, number of observations in original and new data, lambda parameters used in Box-Cox transformation, and flags indicating whether an intercept or weights were added.

Value

A list suitable for use in ldt::search.? functions. The list contains:

data

The final data matrix. Endogenous variables are in the first columns, followed by weights (if provided), then the intercept (if added), and finally the exogenous variables.

numEndo

The number of endogenous variables in the data.

numExo

The number of exogenous variables in the data (including 'intercept' if it is added).

newX

The matrix of new observations for exogenous variables.

lambdas

The lambda parameters used in the Box-Cox transformation.

hasIntercept

Indicates whether an intercept column is added to the final matrix.

hasWeight

Indicates whether there is a weight column in the final matrix.

startFrequency

Frequency of the first observation, extracted from ldtf attribute of data, if available. This will be used in time-series analysis such as VARMA estimation.

Examples

# Example 1:
data <- matrix(1:24, ncol = 6)
result <- get.data(data, endogenous = 1)
print(result$data)

# Example 2:
data <- matrix(1:24, ncol = 6,
               dimnames = list(NULL,c("V1", "V2", "V3", "V4", "V5", "V6")))
result <- get.data(data, endogenous = c("V6", "V1"))
print(result$data)

# Example 3:
data <- data.frame(matrix(1:24, ncol = 6))
colnames(data) <- c("X1", "X2", "Y2", "X3", "Y1", "X4")
equations <- list(
   Y1 ~ X2 + X1,
   Y2 ~ X4 + X3)
result <- get.data(data, equations = equations)
print(result$data)


[Package ldt version 0.5.2 Index]