BKTRRegressor {BKTR}R Documentation

R6 class encapsulating the BKTR regression elements

Description

A BKTRRegressor holds all the key elements to accomplish the MCMC sampling algorithm (Algorithm 1 of the paper).

Public fields

data_df

The dataframe containing all the covariates through time and space (including the response variable)

y

The response variable tensor

omega

The tensor indicating which response values are not missing

covariates

The tensor containing all the covariates

covariates_dim

The dimensions of the covariates tensor

logged_params_tensor

The tensor containing all the sampled hyperparameters

tau

The precision hyperparameter

spatial_decomp

The spatial covariate decomposition

temporal_decomp

The temporal covariate decomposition

covs_decomp

The feature covariate decomposition

result_logger

The result logger instance used to store the results of the MCMC sampling

has_completed_sampling

Boolean showing wheter the MCMC sampling has been completed

spatial_kernel

The spatial kernel used

temporal_kernel

The temporal kernel used

spatial_positions_df

The dataframe containing the spatial positions

temporal_positions_df

The dataframe containing the temporal positions

spatial_params_sampler

The spatial kernel hyperparameter sampler

temporal_params_sampler

The temporal kernel hyperparameter sampler

tau_sampler

The tau hyperparameter sampler

precision_matrix_sampler

The precision matrix sampler

spatial_ll_evaluator

The spatial likelihood evaluator

temporal_ll_evaluator

The temporal likelihood evaluator

rank_decomp

The rank of the CP decomposition

burn_in_iter

The number of burn in iterations

sampling_iter

The number of sampling iterations

max_iter

The total number of iterations

a_0

The initial value for the shape in the gamma function generating tau

b_0

The initial value for the rate in the gamma function generating tau

formula

The formula used to specify the relation between the response variable and the covariates

spatial_labels

The spatial labels

temporal_labels

The temporal labels

feature_labels

The feature labels

geo_coords_projector

The geographic coordinates projector

Active bindings

summary

A summary of the BKTRRegressor instance

beta_covariates_summary

A dataframe containing the summary of the beta covariates

y_estimates

A dataframe containing the y estimates

imputed_y_estimates

A dataframe containing the imputed y estimates

beta_estimates

A dataframe containing the beta estimates

hyperparameters_per_iter_df

A dataframe containing the beta estimates per iteration

decomposition_tensors

List of all used decomposition tensors

Methods

Public methods


Method new()

Create a new BKTRRegressor object.

Usage
BKTRRegressor$new(
  data_df,
  spatial_positions_df,
  temporal_positions_df,
  rank_decomp = 10,
  burn_in_iter = 500,
  sampling_iter = 500,
  formula = NULL,
  spatial_kernel = KernelMatern$new(smoothness_factor = 3),
  temporal_kernel = KernelSE$new(),
  sigma_r = 0.01,
  a_0 = 1e-06,
  b_0 = 1e-06,
  has_geo_coords = TRUE,
  geo_coords_scale = 10
)
Arguments
data_df

data.table: A dataframe containing all the covariates through time and space. It is important that the dataframe has a two indexes named 'location' and 'time' respectively. The dataframe should also contain every possible combinations of 'location' and 'time' (i.e. even missing rows should be filled present but filled with NaN). So if the dataframe has 10 locations and 5 time points, it should have 50 rows (10 x 5). If formula is None, the dataframe should contain the response variable 'Y' as the first column. Note that the covariate columns cannot contain NaN values, but the response variable can.

spatial_positions_df

data.table: Spatial kernel input tensor used to calculate covariates' distance. Vector of length equal to the number of location points.

temporal_positions_df

data.table: Temporal kernel input tensor used to calculate covariate distance. Vector of length equal to the number of time points.

rank_decomp

Integer: Rank of the CP decomposition (Paper – R). Defaults to 10.

burn_in_iter

Integer: Number of iteration before sampling (Paper – K_1). Defaults to 500.

sampling_iter

Integer: Number of sampling iterations (Paper – K_2). Defaults to 500.

formula

A Wilkinson R formula to specify the relation between the response variable 'Y' and the covariates. If Null, the first column of the data frame will be used as the response variable and all the other columns will be used as the covariates. Defaults to Null.

spatial_kernel

Kernel: Spatial kernel Used. Defaults to a KernelMatern(smoothness_factor=3).

temporal_kernel

Kernel: Temporal kernel used. Defaults to KernelSE().

sigma_r

Numeric: Variance of the white noise process (\tau^{-1}) defaults to 1E-2.

a_0

Numeric: Initial value for the shape (\alpha) in the gamma function generating tau defaults to 1E-6.

b_0

Numeric: Initial value for the rate (\beta) in the gamma function generating tau defaults to 1E-6.

has_geo_coords

Boolean: Whether the spatial positions df use geographic coordinates (latitude, longitude). Defaults to TRUE.

geo_coords_scale

Numeric: Scale factor to convert geographic coordinates to euclidean 2D space via Mercator projection using x & y domains of [-scale/2, +scale/2]. Only used if has_geo_coords is TRUE. Defaults to 10.

Returns

A new BKTRRegressor object.


Method mcmc_sampling()

Launch the MCMC sampling process.
For a predefined number of iterations:

  1. Sample spatial kernel hyperparameters

  2. Sample temporal kernel hyperparameters

  3. Sample the precision matrix from a wishart distribution

  4. Sample a new spatial covariate decomposition

  5. Sample a new feature covariate decomposition

  6. Sample a new temporal covariate decomposition

  7. Calculate respective errors for the iterations

  8. Sample a new tau value

  9. Collect all the important data for the iteration

Usage
BKTRRegressor$mcmc_sampling()
Returns

NULL Results are stored and can be accessed via summary()


Method predict()

Use interpolation to predict betas and response values for new data.

Usage
BKTRRegressor$predict(
  new_data_df,
  new_spatial_positions_df = NULL,
  new_temporal_positions_df = NULL,
  jitter = 1e-05
)
Arguments
new_data_df

data.table: New covariates. Must have the same columns as the covariates used to fit the model. The index should contain the combination of all old spatial coordinates with all new temporal coordinates, the combination of all new spatial coordinates with all old temporal coordinates, and the combination of all new spatial coordinates with all new temporal coordinates.

new_spatial_positions_df

data.table or NULL: A data frame containing the new spatial positions. Defaults to NULL.

new_temporal_positions_df

data.table or NULL: A data frame containing the new temporal positions. Defaults to NULL.

jitter

Numeric or NULL: A small value to add to the diagonal of the precision matrix. Defaults to NULL.

Returns

List: A list of two dataframes. The first represents the beta forecasted for all new spatial locations or temporal points. The second represents the forecasted response for all new spatial locations or temporal points.


Method get_iterations_betas()

Return all sampled betas through sampling iterations for a given set of spatial, temporal and feature labels. Useful for plotting the distribution of sampled beta values.

Usage
BKTRRegressor$get_iterations_betas(
  spatial_label,
  temporal_label,
  feature_label
)
Arguments
spatial_label

String: The spatial label for which we want to get the betas

temporal_label

String: The temporal label for which we want to get the betas

feature_label

String: The feature label for which we want to get the betas

Returns

A list containing the sampled betas through iteration for the given labels


Method get_beta_summary_df()

Get a summary of estimated beta values. If no labels are given, then the summary is for all the betas. If labels are given, then the summary is for the given labels.

Usage
BKTRRegressor$get_beta_summary_df(
  spatial_labels = NULL,
  temporal_labels = NULL,
  feature_labels = NULL
)
Arguments
spatial_labels

vector: The spatial labels used in summary. If NULL, then all spatial labels are used. Defaults to NULL.

temporal_labels

vector: The temporal labels used in summary. If NULL, then all temporal labels are used. Defaults to NULL.

feature_labels

vector: The feature labels used in summary. If NULL, then all feature labels are used. Defaults to NULL.

Returns

A new data.table with the beta summary for the given labels.


Method clone()

The objects of this class are cloneable with this method.

Usage
BKTRRegressor$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Examples


# Create a BIXI data collection instance containing multiple dataframes
bixi_data <- BixiData$new(is_light = TRUE) # Use light version for example

# Create a BKTRRegressor instance
bktr_regressor <- BKTRRegressor$new(
  formula = nb_departure ~ 1 + mean_temp_c + area_park,
  data_df <- bixi_data$data_df,
  spatial_positions_df = bixi_data$spatial_positions_df,
  temporal_positions_df = bixi_data$temporal_positions_df,
  burn_in_iter = 5, sampling_iter = 10) # For example only (too few iterations)

# Launch the MCMC sampling
bktr_regressor$mcmc_sampling()

# Get the summary of the bktr regressor
summary(bktr_regressor)

# Get estimated response variables for missing values
bktr_regressor$imputed_y_estimates

# Get the list of sampled betas for given spatial, temporal and feature labels
bktr_regressor$get_iterations_betas(
  spatial_label = bixi_data$spatial_positions_df$location[1],
  temporal_label = bixi_data$temporal_positions_df$time[1],
  feature_label = 'mean_temp_c')

# Get the summary of all betas for the 'mean_temp_c' feature
bktr_regressor$get_beta_summary_df(feature_labels = 'mean_temp_c')


## PREDICTION EXAMPLE ##
# Create a light version of the BIXI data collection instance
bixi_data <- BixiData$new(is_light = TRUE)
# Simplify variable names
data_df <- bixi_data$data_df
spa_pos_df <- bixi_data$spatial_positions_df
temp_pos_df <- bixi_data$temporal_positions_df

# Keep some data aside for prediction
new_spa_pos_df <- spa_pos_df[1:2, ]
new_temp_pos_df <- temp_pos_df[1:5, ]
reg_spa_pos_df <- spa_pos_df[-(1:2), ]
reg_temp_pos_df <- temp_pos_df[-(1:5), ]
reg_data_df_mask <- data_df$location %in% reg_spa_pos_df$location &
  data_df$time %in% reg_temp_pos_df$time
reg_data_df <- data_df[reg_data_df_mask, ]
new_data_df <- data_df[!reg_data_df_mask, ]

# Launch mcmc sampling on regression data
bktr_regressor <- BKTRRegressor$new(
  formula = nb_departure ~ 1 + mean_temp_c + area_park,
  data_df = reg_data_df,
  spatial_positions_df = reg_spa_pos_df,
  temporal_positions_df = reg_temp_pos_df,
  burn_in_iter = 5, sampling_iter = 10) # For example only (too few iterations)
bktr_regressor$mcmc_sampling()

# Predict response values for new data
bktr_regressor$predict(
  new_data_df = new_data_df,
  new_spatial_positions_df = new_spa_pos_df,
  new_temporal_positions_df = new_temp_pos_df)


[Package BKTR version 0.1.1 Index]