R: Multi Split Conformal Prediction Regions with Multivariate...

conformal.multidim.msplit {conformalInference.multi}

R Documentation

Multi Split Conformal Prediction Regions with Multivariate Response

Description

Compute prediction intervals using Multi Split conformal inference with multivariate response.

Usage

conformal.multidim.msplit(
  x,
  y,
  x0,
  train.fun,
  predict.fun,
  alpha = 0.1,
  split = NULL,
  seed = FALSE,
  randomized = FALSE,
  seed.rand = FALSE,
  verbose = FALSE,
  rho = NULL,
  score = "max",
  s.type = "st-dev",
  B = 100,
  lambda = 0,
  tau = 0.1,
  mad.train.fun = NULL,
  mad.predict.fun = NULL
)

Arguments

`x`	The feature variables, a matrix nxp.
`y`	The matrix of multivariate responses (dimension nxq)
`x0`	The new points to evaluate, a matrix of dimension n0xp.
`train.fun`	A function to perform model training, i.e., to produce an estimator of E(Y\|X), the conditional expectation of the response variable Y given features X. Its input arguments should be x: matrix of features, and y: matrix of responses.
`predict.fun`	A function to perform prediction for the (mean of the) responses at new feature values. Its input arguments should be out: output produced by train.fun, and newx: feature values at which we want to make predictions.
`alpha`	Miscoverage level for the prediction intervals, i.e., intervals with coverage 1-alpha are formed. Default for alpha is 0.1.
`split`	Indices that define the data-split to be used (i.e., the indices define the first half of the data-split, on which the model is trained). Default is NULL, in which case the split is chosen randomly.
`seed`	Integer to be passed to set.seed before defining the random data-split to be used. Default is FALSE, which effectively sets no seed. If both split and seed are passed, the former takes priority and the latter is ignored.
`randomized`	Should the randomized approach be used? Default is FALSE.
`seed.rand`	The seed for the randomized version. Default is FALSE.
`verbose`	Should intermediate progress be printed out? Default is FALSE.
`rho`	Split proportion between training and calibration set. Default is 0.5.
`score`	The chosen score for the split conformal function.
`s.type`	The type of modulation function. Currently we have 3 options: "identity","st-dev","alpha-max". Default is "std-dev"
`B`	Number of repetitions. Default is 100.
`lambda`	Smoothing parameter. Default is 0.
`tau`	It is a smoothing parameter: tau=1-1/B Bonferroni intersection method tau=0 unadjusted intersection Default is 1-(B+1)/(2*B).
`mad.train.fun`	A function to perform training on the absolute residuals i.e., to produce an estimator of E(R\|X) where R is the absolute residual R = \|Y - m(X)\|, and m denotes the estimator produced by train.fun. This is used to scale the conformal score, to produce a prediction interval with varying local width. The input arguments to mad.train.fun should be x: matrix of features, y: vector of absolute residuals, and out: the output produced by a previous call to mad.train.fun, at the same features x. The function mad.train.fun may (optionally) leverage this returned output for efficiency purposes. See details below. The default for mad.train.fun is NULL, which means that no training is done on the absolute residuals, and the usual (unscaled) conformal score is used. Note that if mad.train.fun is non-NULL, then so must be mad.predict.fun (next).
`mad.predict.fun`	A function to perform prediction for the (mean of the) absolute residuals at new feature values. Its input arguments should be out: output produced by mad.train.fun, and newx: feature values at which we want to make predictions. The default for mad.predict.fun is NULL, which means that no local scaling is done for the conformal score, i.e., the usual (unscaled) conformal score is used.

Details

The work is an extension of the univariate approach to Multi Split conformal inference to a multivariate context, exploiting the concept of depth measure.

This function is based on the package future.apply to perform parallelization.

Value

A list with length n0, giving the lower and upper bounds for each observation.

References

"Multi Split Conformal Prediction" by Solari, Djordjilovic (2021) <arXiv:2103 .00627> is the baseline for the univariate case.

Examples

  set.seed(12345)

  n=200
  p=4
  q=2
  mu=rep(0,p)
  x = mvtnorm::rmvnorm(n, mu)
  beta<-sapply(1:q, function(k) c(mvtnorm::rmvnorm(1,mu)))
  y = x%*%beta + t(mvtnorm::rmvnorm(q,1:n))
  x0=matrix(x[n,],nrow=1)
  y0=matrix(y[n,],nrow=1)
  n0<-nrow(y0)
  q<-ncol(y)
  B=100
  funs=lm_multi()


  sol<-conformal.multidim.msplit(x,y, x0, train.fun = funs$train.fun,
                                            predict.fun = funs$predict.fun, alpha=0.05,
                                            split=NULL, seed=FALSE, randomized=FALSE,
                                 seed.rand=FALSE,
                                            verbose=FALSE, rho=NULL,score = "max",
                                            s.type = "st-dev",B=B,lambda=0,
                                            tau = 0.1,mad.train.fun = NULL,
                                            mad.predict.fun = NULL)

sol

[Package conformalInference.multi version 1.1.1 Index]