R: Transform and Prepare Data for Analysis

get.data {ldt}

R Documentation

Transform and Prepare Data for Analysis

Description

This function prepares a data matrix for analysis. It applies a Box-Cox transformation to the endogenous variables, adds an intercept column, and optionally includes new rows with exogenous data.

Usage

get.data(
  data,
  endogenous = 1,
  equations = NULL,
  weights = NULL,
  lambdas = NULL,
  newData = NULL,
  addIntercept = TRUE,
  ...
)

Arguments

`data`	A data.frame or a numeric matrix that serves as the primary data source.
`endogenous`	A single number indicating the number of endogenous variables in the first columns, or a list of names specifying the endogenous variables. The remaining variables will be treated as exogenous.
`equations`	A formula or a list of formula objects that represent the equations to be used instead of `endogenous`. If provided, the final data will be a matrix where the response variables are in the first columns and the predictor variables are in the subsequent columns.
`weights`	A numeric vector or a column matrix representing weights of observations. Not all applications implement this parameter.
`lambdas`	A numeric vector, a single number, NA, or NULL indicating the lambda parameter(s) for the Box-Cox transformation. Use `NULL` for no transformation, `NA` for estimating the lambda parameter for each variable, a single number for an equal lambda parameter for all variables, and a numeric vector for distinct lambda parameters for corresponding variables.
`newData`	A data.frame or a numeric matrix representing new data for exogenous variables. It should have a structure similar to `data`, excluding endogenous or response variables.
`addIntercept`	A logical value indicating whether to add an intercept column to the final matrix.
`...`	Additional parameters for the `MASS::boxcox` function.

Details

This function is designed to prepare a data matrix for model search (or screening) analysis. It performs several operations to transform and structure the data appropriately.

The function first checks if the input data is a matrix or a data frame. If new data is provided, it also checks its type. It then extracts the frequency of the first observation from the ldtf attribute of the data, if available.

If no equations are provided, the function assumes that the endogenous variables are in the first columns of the data. It checks if an intercept is already present and throws an error if one is found and addIntercept is set to TRUE. It then validates the number of endogenous variables and converts the data to a numeric matrix.

If column names are missing, they are added based on the number of endogenous and exogenous variables. If new data is provided, it checks its structure and matches it with the exogenous part of the original data.

If equations are provided, they are used to transform the original data into a matrix where response variables are in the first columns and predictor variables in subsequent columns. The new data is also transformed accordingly.

The function then applies a Box-Cox transformation to the endogenous variables if lambda parameters are provided. Weights are added if provided, and an intercept column is added if addIntercept is set to TRUE.

Finally, the function returns a list containing all relevant information for further analysis. This includes the final data matrix, number of endogenous and exogenous variables, number of observations in original and new data, lambda parameters used in Box-Cox transformation, and flags indicating whether an intercept or weights were added.

Value

A list suitable for use in ldt::search.? functions. The list contains:

`data`	The final data matrix. Endogenous variables are in the first columns, followed by weights (if provided), then the intercept (if added), and finally the exogenous variables.
`numEndo`	The number of endogenous variables in the data.
`numExo`	The number of exogenous variables in the data (including 'intercept' if it is added).
`newX`	The matrix of new observations for exogenous variables.
`lambdas`	The lambda parameters used in the Box-Cox transformation.
`hasIntercept`	Indicates whether an intercept column is added to the final matrix.
`hasWeight`	Indicates whether there is a weight column in the final matrix.
`startFrequency`	Frequency of the first observation, extracted from `ldtf` attribute of `data`, if available. This will be used in time-series analysis such as VARMA estimation.

Examples

# Example 1:
data <- matrix(1:24, ncol = 6)
result <- get.data(data, endogenous = 1)
print(result$data)

# Example 2:
data <- matrix(1:24, ncol = 6,
               dimnames = list(NULL,c("V1", "V2", "V3", "V4", "V5", "V6")))
result <- get.data(data, endogenous = c("V6", "V1"))
print(result$data)

# Example 3:
data <- data.frame(matrix(1:24, ncol = 6))
colnames(data) <- c("X1", "X2", "Y2", "X3", "Y1", "X4")
equations <- list(
   Y1 ~ X2 + X1,
   Y2 ~ X4 + X3)
result <- get.data(data, equations = equations)
print(result$data)

[Package ldt version 0.5.3 Index]