estimate_cerf_gp {GPCERF}R Documentation

Estimate the conditional exposure response function using Gaussian process

Description

Estimates the conditional exposure response function (cerf) using Gaussian Process (gp). The function tune the best match (the lowest covariate balance) for the provided set of hyperparameters.

Usage

estimate_cerf_gp(
  data,
  w,
  gps_m,
  params,
  outcome_col,
  treatment_col,
  covariates_col,
  nthread = 1,
  kernel_fn = function(x) exp(-x^2)
)

Arguments

data

A data.frame of observation data.

w

A vector of exposure level to compute CERF (please also see the notes).

gps_m

An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE

params

A list of parameters that is required to run the process. These parameters include:

  • alpha: A scaling factor for the GPS value.

  • beta: A scaling factor for the exposure value.

  • g_sigma: A scaling factor for kernel function (gamma/sigma).

  • tune_app: A tuning approach. Available approaches:

    • all: try all combinations of hyperparameters. alpha, beta, and g_sigma can be a vector of parameters.

outcome_col

An outcome column name in data.

treatment_col

A treatment column name in data.

covariates_col

Covariates columns name in data.

nthread

An integer value that represents the number of threads to be used by internal packages.

kernel_fn

A kernel function. A default value is a Gaussian Kernel.

Value

A cerf_gp object that includes the following values:

Note

Please note that w is a vector representing a grid of exposure levels at which the CERF is to be estimated. This grid can include both observed and hypothetical values of the exposure variable. The purpose of defining this grid is to provide a structured set of points across the exposure spectrum for estimating the CERF. This approach is essential in nonparametric models like Gaussian Processes (GPs), where the CERF is evaluated at specific points to understand the relationship between the exposure and outcome variables across a continuum. It facilitates a comprehensive analysis by allowing practitioners to examine the effect of varying exposure levels, including those not directly observed in the dataset.

Examples


set.seed(129)
data <- generate_synthetic_data(sample_size = 100, gps_spec = 3)


# Estimate GPS function
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
                      w_all = data$treat,
                      sl_lib = c("SL.xgboost"),
                      dnorm_log = FALSE)

# exposure values
w_all <- seq(0,10,1)


cerf_gp_obj <- estimate_cerf_gp(data,
                                w_all,
                                gps_m,
                                params = list(alpha = c(0.1),
                                              beta=0.2,
                                              g_sigma = 1,
                                              tune_app = "all"),
                                outcome_col = "Y",
                                treatment_col = "treat",
                                covariates_col = paste0("cf", seq(1,6)),
                                nthread = 1)



[Package GPCERF version 0.2.4 Index]