R: temporal Gaussian process

tempGP {DSWE}

R Documentation

temporal Gaussian process

Description

A Gaussian process based power curve model which explicitly models the temporal aspect of the power curve. The model consists of two parts: f(x) and g(t).

Usage

tempGP(
  trainX,
  trainY,
  trainT = NULL,
  fast_computation = TRUE,
  limit_memory = 5000L,
  max_thinning_number = 20L,
  vecchia = TRUE,
  optim_control = list(batch_size = 100L, learn_rate = 0.05, max_iter = 5000L, tol =
    1e-06, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-08, logfile = NULL)
)

Arguments

`trainX`	A matrix with each column corresponding to one input variable.
`trainY`	A vector with each element corresponding to the output at the corresponding row of `trainX`.
`trainT`	A vector for time indices of the data points. By default, the function assigns natural numbers starting from 1 as the time indices.
`fast_computation`	A Boolean that specifies whether to do exact inference or fast approximation. Default is `TRUE`.
`limit_memory`	An integer or `NULL`. The integer is used sample training points during prediction to limit the total memory requirement. Setting the value to `NULL` would result in no sampling, that is, full training data is used for prediction. Default value is `5000`.
`max_thinning_number`	An integer specifying the max lag to compute the thinning number. If the PACF does not become insignificant till `max_thinning_number`, then `max_thinning_number` is used for thinning.
`vecchia`	A Boolean that specifies whether to do exact inference or vecchia approximation. Default is `TRUE`.
`optim_control`	A list parameters passed to the Adam optimizer when `fast_computation` is set to `TRUE`. The default values have been tested rigorously and tend to strike a balance between accuracy and speed. `batch_size`: Number of training points sampled at each iteration of Adam. `learn_rate`: The step size for the Adam optimizer. `max_iter`: The maximum number of iterations to be performed by Adam. `tol`: Gradient tolerance. `beta1`: Decay rate for the first moment of the gradient. `beta2`: Decay rate for the second moment of the gradient. `epsilon`: A small number to avoid division by zero. `logfile`: A string specifying a file name to store hyperparameters value for each iteration.

Value

An object of class tempGP with the following attributes:

trainX - same as the input matrix trainX.
trainY - same as the input vector trainY.
thinningNumber - the thinning number computed by the algorithm.
modelF - A list containing the details of the model for predicting function f(x):
- X - The input variable matrix for computing the cross-covariance for predictions, same as trainX unless the model is updated. See updateData.tempGP method for details on updating the model.
- y - The response vector, again same as trainY unless the model is updated.
- weightedY - The weighted response, that is, the response left multiplied by the inverse of the covariance matrix.
modelG - A list containing the details of the model for predicting function g(t):
- residuals - The residuals after subtracting function f(x) from the response. Used to predict g(t). See updateData.tempGP method for updating the residuals.
- time_index - The time indices of the residuals, same as trainT.
estimatedParams - Estimated hyperparameters for function f(x).
llval - log-likelihood value of the hyperparameter optimization for f(x).
gradval - gradient vector at the optimal log-likelihood value.

References

Prakash, A., Tuo, R., & Ding, Y. (2022). "The temporal overfitting problem with applications in wind power curve modeling." Technometrics. doi:10.1080/00401706.2022.2069158.

Katzfuss, M., & Guinness, J. (2021). "A General Framework for Vecchia Approximations of Gaussian Processes." Statistical Science. doi:10.1214/19-STS755.

Guinness, J. (2018). "Permutation and Grouping Methods for Sharpening Gaussian Process Approximations." Technometrics. doi:10.1080/00401706.2018.1437476.

Examples


    data = DSWE::data1
    trainindex = 1:50 #using the first 50 data points to train the model
    traindata = data[trainindex,]
    xCol = 2 #input variable columns
    yCol = 7 #response column
    trainX = as.matrix(traindata[,xCol])
    trainY = as.numeric(traindata[,yCol])
    tempGPObject = tempGP(trainX, trainY)

[Package DSWE version 1.8.2 Index]