rgasp {RobustGaSP}R Documentation

Setting up the robust GaSP model

Description

Setting up the robust GaSP model for estimating the parameters (if the parameters are not given).

Usage

  rgasp(design, response,trend=matrix(1,length(response),1),zero.mean="No",nugget=0,
    nugget.est=F,range.par=NA,method='post_mode',prior_choice='ref_approx',a=0.2,
    b=1/(length(response))^{1/dim(as.matrix(design))[2]}*(a+dim(as.matrix(design))[2]),
    kernel_type='matern_5_2',isotropic=F,R0=NA, 
    optimization='lbfgs', alpha=rep(1.9,dim(as.matrix(design))[2]),
    lower_bound=T,max_eval=max(30,20+5*dim(design)[2]),
    initial_values=NA,num_initial_values=2)
 

Arguments

design

a matrix of inputs.

response

a matrix of outputs.

trend

the mean/trend matrix of inputs. The default value is a vector of ones.

zero.mean

it has zero mean or not. The default value is NO meaning the mean is not zero. Yes means the mean is zero.

nugget

numerical value of the nugget variance ratio. If nugget is equal to 0, it means there is either no nugget or the nugget is estimated. If the nugget is not equal to 0, it means a fixed nugget. The default value is 0.

nugget.est

boolean value. T means nugget should be estimated and F means nugget is fixed or not estimated. The default value is F F.

range.par

either NA or a vector. If it is NA, it means range parameters are estimated; otherwise range parameters are given. The default value is NA.

method

method of parameter estimation. post_mode means the marginal posterior mode is used for estimation. mle means the maximum likelihood estimation is used. mmle means the maximum marginal likelihood estimation is used. The post_mode is the default method.

prior_choice

the choice of prior for range parameters and noise-variance parameters. ref_xi and ref_gamma means the reference prior with reference prior with the log of inverse range parameterization ξ or range parameterization γ. ref_approx uses the jointly robust prior to approximate the reference prior. The default choice is ref_approx.

a

prior parameters in the jointly robust prior. The default value is 0.2.

b

prior parameters in the jointly robust prior. The default value is n^{-1/p}(a+p) where n is the number of runs and p is the dimension of the input vector.

kernel_type

A vector specifying the type of kernels of each coordinate of the input. matern_3_2 and matern_5_2 are Matern correlation with roughness parameter 3/2 and 5/2 respectively. pow_exp is power exponential correlation with roughness parameter alpha. If pow_exp is to be used, one needs to specify its roughness parameter alpha. The default choice is matern_5_2. The periodic_gauss means the Gaussian kernel with periodic folding method with be used. The periodic_exp means the exponential kernel with periodic folding method will be used.

isotropic

a boolean value. T means the isotropic kernel will be used and F means the separable kernel will be used. The default choice is the separable kernel.

R0

the distance between inputs. If the value is NA, it will be calculated later. It can also be specified by the user. If specified by user, it is either a matrix or list. The default value is NA.

optimization

the method for numerically optimization of the kernel parameters. Currently three methods are implemented. lbfgs is the low-storage version of the Broyden-Fletcher-Goldfarb-Shanno method. nelder-mead is the Nelder and Mead method. brent is the Brent method for one-dimensional problems.

alpha

roughness parameters in the pow_exp correlation functions. The default choice is a vector with each entry being 1.9.

lower_bound

boolean value. T means the default lower bounds of the inverse range parameters are used to constrained the optimization and F means the optimization is unconstrained. The default value is T and we also suggest to use F in various scenarios.

max_eval

the maximum number of steps to estimate the range and nugget parameters.

initial_values

a matrix of initial values of the kernel parameters to be optimized numerically, where each row of the matrix contains a set of the log inverse range parameters and the log nugget parameter.

num_initial_values

the number of initial values of the kernel parameters in optimization.

Value

rgasp returns a S4 object of class rgasp (see rgasp-class).

Author(s)

Mengyang Gu [aut, cre], Jesus Palomo [aut], James Berger [aut]

Maintainer: Mengyang Gu <mengyang@pstat.ucsb.edu>

References

M. Gu, X. Wang and J.O. Berger (2018), Robust Gaussian stochastic process emulation, Annals of Statistics, 46(6A), 3038-3066.

M. Gu (2018), Jointly robust prior for Gaussian stochastic process in emulation, calibration and variable selection, arXiv:1804.09329.

M. Gu. (2016). Robust uncertainty quantification and scalable computation for computer models with massive output. Ph.D. thesis. Duke University.

M. Gu. and J.O. Berger (2016). Parallel partial Gaussian process emulation for computer models with massive output. Annals of Applied Statistics, 10(3), 1317-1347.

E.T. Spiller, M.J. Bayarri, J.O. Berger and E.S. Calder and A.K. Patra and E.B. Pitman, and R.L. Wolpert (2014), Automating emulator construction for geophysical hazard maps. SIAM/ASA Journal on Uncertainty Quantification, 2(1), 126-152.

J. Nocedal (1980), Updating quasi-Newton matrices with limited storage, Math. Comput., 35, 773-782.

D. C. Liu and J. Nocedal (1989), On the limited memory BFGS method for large scale optimization, Math. Programming, 45, p. 503-528.

Brent, R. (1973), Algorithms for Minimization without Derivatives. Englewood Cliffs N.J.: Prentice-Hall.

Examples

  library(RobustGaSP)
  #------------------------
  # a 3 dimensional example
  #------------------------
  # dimensional of the inputs
  dim_inputs <- 3    
  # number of the inputs
  num_obs <- 50       
  # uniform samples of design
  input <- matrix(runif(num_obs*dim_inputs), num_obs,dim_inputs) 
  
  # Following codes use maximin Latin Hypercube Design, which is typically better than uniform
  # library(lhs)
  # input <- maximinLHS(n=num_obs, k=dim_inputs)  ##maximin lhd sample
  
  ####
  # outputs from the 3 dim dettepepel.3.data function
  
  output = matrix(0,num_obs,1)
  for(i in 1:num_obs){
    output[i]<-dettepepel.3.data (input[i,])
  }
  
  # use constant mean basis, with no constraint on optimization
  # and marginal posterior mode estimation
  m1<- rgasp(design = input, response = output, lower_bound=FALSE)
  
  # you can use specify the estimation as maximum likelihood estimation (MLE)
  m2<- rgasp(design = input, response = output, method='mle',lower_bound=FALSE)
  
  ##let's do some comparison on prediction
  n_testing=1000
  testing_input=matrix(runif(n_testing*dim_inputs),n_testing,dim_inputs)
  
  m1_pred=predict(m1,testing_input=testing_input)
  m2_pred=predict(m2,testing_input=testing_input)
  
  
  ##root of mean square error and interval
  test_output = matrix(0,n_testing,1)
  for(i in 1:n_testing){
    test_output[i]<-dettepepel.3.data (testing_input[i,])
  }
  
  ##root of mean square error
  sqrt(mean( (m1_pred$mean-test_output)^2))
  sqrt(mean( (m2_pred$mean-test_output)^2))
  sd(test_output)
  #---------------------------------------
  # a 1 dimensional example with zero mean
  #---------------------------------------


  input=10*seq(0,1,1/14)
  output<-higdon.1.data(input)
  #the following code fit a GaSP with zero mean by setting zero.mean="Yes"
  model<- rgasp(design = input, response = output, zero.mean="Yes")
  model
  
  testing_input = as.matrix(seq(0,10,1/100))
  model.predict<-predict(model,testing_input)
  names(model.predict)
  
  #########plot predictive distribution
  testing_output=higdon.1.data(testing_input)
  plot(testing_input,model.predict$mean,type='l',col='blue',
       xlab='input',ylab='output')
  polygon( c(testing_input,rev(testing_input)),c(model.predict$lower95,
        rev(model.predict$upper95)),col =  "grey80", border = FALSE)
  lines(testing_input, testing_output)
  lines(testing_input,model.predict$mean,type='l',col='blue')
  lines(input, output,type='p')
  
  ## mean square erros
  mean((model.predict$mean-testing_output)^2)


  #-----------------------------------
  # a 2 dimensional example with trend
  #-----------------------------------
  # dimensional of the inputs
  dim_inputs <- 2    
  # number of the inputs
  num_obs <- 20       
  
  # uniform samples of design
  input <-matrix(runif(num_obs*dim_inputs), num_obs,dim_inputs) 
  # Following codes use maximin Latin Hypercube Design, which is typically better than uniform
  # library(lhs)
  # input <- maximinLHS(n=num_obs, k=dim_inputs)  # maximin lhd sample
  
  # outputs from a 2 dim function
  
  output <- matrix(0,num_obs,1)
  for(i in 1:num_obs){
    output[i]<-limetal.2.data (input[i,])
  }
  
  ####trend or mean basis
  X<-cbind(rep(1,num_obs), input )
  
  
  # use constant mean basis with trend, with no constraint on optimization
  m2<- rgasp(design = input, response = output,trend =X,  lower_bound=FALSE)

  show(m2)      # show this rgasp object 
  
  m2@beta_hat       # estimated inverse range parameters
  m2@theta_hat      # estimated trend parameters

  #--------------------------------------------------------------------------------------
  # an 8 dimensional example using only a subset inputs and a noise with unknown variance
  #--------------------------------------------------------------------------------------
  set.seed(1)
  # dimensional of the inputs
  dim_inputs <- 8    
  # number of the inputs
  num_obs <- 50       
  
  # uniform samples of design
  input <-matrix(runif(num_obs*dim_inputs), num_obs,dim_inputs) 
  # Following codes use maximin Latin Hypercube Design, which is typically better than uniform
  # library(lhs)
  # input <- maximinLHS(n=num_obs, k=dim_inputs)  # maximin lhd sample
  
  # rescale the design to the domain
  input[,1]<-0.05+(0.15-0.05)*input[,1];
  input[,2]<-100+(50000-100)*input[,2];
  input[,3]<-63070+(115600-63070)*input[,3];
  input[,4]<-990+(1110-990)*input[,4];
  input[,5]<-63.1+(116-63.1)*input[,5];
  input[,6]<-700+(820-700)*input[,6];
  input[,7]<-1120+(1680-1120)*input[,7];
  input[,8]<-9855+(12045-9855)*input[,8];
  
  # outputs from the 8 dim Borehole function
  
  output=matrix(0,num_obs,1)
  for(i in 1:num_obs){
    output[i]=borehole(input[i,])
  }
  
  
    
    
  
  # use constant mean basis with trend, with no constraint on optimization
  m3<- rgasp(design = input[,c(1,4,6,7,8)], response = output, 
            nugget.est=TRUE, lower_bound=FALSE)

  m3@beta_hat       # estimated inverse range parameters
  m3@nugget     



[Package RobustGaSP version 0.6.6 Index]