RFGLS_estimate_timeseries {RandomForestsGLS} | R Documentation |
Function for estimation in time-series data with RF-GLS
Description
The function RFGLS_estimate_spatial
fits univariate non-linear regression models for
time-series data using a RF-GLS in Saha et al. 2020. RFGLS_estimate_spatial
uses the sparse Cholesky representation
corresponsinding to AR(q)
process. The fitted Random Forest (RF) model is used later for
prediction via the RFGLS-predict
.
Some code blocks are borrowed from the R packages: spNNGP:
Spatial Regression Models for Large Datasets using Nearest Neighbor
Gaussian Processes
https://CRAN.R-project.org/package=spNNGP and
randomForest: Breiman and Cutler's Random Forests for Classification
and Regression
https://CRAN.R-project.org/package=randomForest .
Usage
RFGLS_estimate_timeseries(y, X, Xtest = NULL, nrnodes = NULL,
nthsize = 20, mtry = 1,
pinv_choice = 1, n_omp = 1,
ntree = 50, h = 1, lag_params = 0.5,
variance = 1,
param_estimate = FALSE,
verbose = FALSE)
Arguments
y |
an |
X |
an |
Xtest |
an |
nrnodes |
the maximum number of nodes a tree can have. Default choice leads to the deepest tree contigent on |
nthsize |
minimum size of leaf nodes. We recommend not setting this value too small, as that will lead to very deep trees that takes a lot of time to be built and can produce unstable estimaes. Default value is 20. |
mtry |
number of variables randomly sampled at each partition as a candidate split direction. We recommend using
the value p/3 where p is the number of variables in |
pinv_choice |
dictates the choice of method for obtaining the pseudoinverse involved in the cost function and node
representative evaluation. if pinv_choice = 0, SVD is used (slower but more stable), if pinv_choice = 1,
orthogonal decomposition (faster, may produce unstable results if |
n_omp |
number of threads to be used, value can be more than 1 if source code is compiled with OpenMP support. Default is 1. |
ntree |
number of trees to be grown. This value should not be too small. Default value is 50. |
h |
number of core to be used in parallel computing setup for bootstrap samples. If h = 1, there is no parallelization. Default value is 1. |
lag_params |
|
variance |
variance of the white noise in temporal error. The function estimate is not affected by this. Default value is 1. |
param_estimate |
if |
verbose |
if |
Value
A list comprising:
P_matrix |
an |
predicted_matrix |
an |
predicted |
preducted values at the |
X |
the matrix |
y |
the vector |
RFGLS_Object |
object required for prediction. |
Author(s)
Arkajyoti Saha arkajyotisaha93@gmail.com,
Sumanta Basu sumbose@cornell.edu,
Abhirup Datta abhidatta@jhu.edu
References
Saha, A., Basu, S., & Datta, A. (2020). Random Forests for dependent data. arXiv preprint arXiv:2007.15421.
Saha, A., & Datta, A. (2018). BRISC: bootstrap for rapid inference on spatial covariances. Stat, e184, DOI: 10.1002/sta4.184.
Andy Liaw, and Matthew Wiener (2015). randomForest: Breiman and Cutler's Random
Forests for Classification and Regression. R package version 4.6-14.
https://CRAN.R-project.org/package=randomForest
Andrew Finley, Abhirup Datta and Sudipto Banerjee (2017). spNNGP: Spatial Regression Models for Large Datasets using Nearest Neighbor Gaussian Processes. R package version 0.1.1. https://CRAN.R-project.org/package=spNNGP
Examples
rmvn <- function(n, mu = 0, V = matrix(1)){
p <- length(mu)
if(any(is.na(match(dim(V),p))))
stop("Dimension not right!")
D <- chol(V)
t(matrix(rnorm(n*p), ncol=p)%*%D + rep(mu,rep(n,p)))
}
set.seed(2)
n <- 200
x <- as.matrix(rnorm(n),n,1)
sigma.sq <- 1
rho <- 0.5
set.seed(3)
b <- rho
s <- sqrt(sigma.sq)
eps = arima.sim(list(order = c(1,0,0), ar = b),
n = n, rand.gen = rnorm, sd = s)
y <- eps + 10*sin(pi * x)
estimation_result <- RFGLS_estimate_timeseries(y, x, ntree = 10)