R: Distributional index model (DIM)

dindexm {isodistrreg}

R Documentation

Distributional index model (DIM)

Description

Fits distributional index model with user-specified index function to training dataset. See the examples at the bottom to learn how to specify a distributional single index model.

Usage

dindexm(
  formula,
  indexfit,
  data,
  response,
  pars = osqpSettings(verbose = FALSE, eps_abs = 1e-05, eps_rel = 1e-05, max_iter =
    10000L),
  progress = TRUE,
  ...
)

Arguments

`formula`	object of class `formula` that describes the index model
`indexfit`	function that fits the index model to training data. Should accept arguments `formula` and `data` and admit a `predict` method. Further arguments in `...` are passed to indexfit. See examples.
`data`	`data.frame` containing the covariates of the index model and the response variable.
`response`	name of the response variable in `data`.
`pars`	parameters for quadratic programming optimization (only relevant for multivariate index functions), set using `osqpSettings`.
`progress`	display progressbar for fitting idr?
`...`	further arguments passed to `indexfit`.

Details

This function fits a distributional index model (DIM) to training data. The DIM assumes that the response is more likely to attain higher values when the values of the index function increases. The index function can be estimated by parametric methods like lm or glm or also nonparametrically.

The formal mathematical assumption of the DIM is that the conditional CDFs F_{y | g(X) = g(x)}(z) at each fixed threshold z decreases, as g(x) increases. Here y denotes the response, x, X are the covariates in data and g is the index function estimated by indexfit.

Estimation is performed in two steps: indexfit is applied to data to estimate the function g. With this estimate, idr is applied with the pseudo-covariates g(x) and response y.

Value

Object of class dindexm: A list containing the index model (first component) and the IDR fit on the pseudo-data with the index as covariate (second component).

References

Henzi, A., Kleger, G. R., & Ziegel, J. F. (2020). Distributional (Single) Index Models. arXiv preprint arXiv:2006.09219.

Examples

n <- 1000
X <- data.frame(x1 = rnorm(n), x2 = rnorm(n), x3 = rnorm(n))
y <- rnorm(n, 1 - X[, 1] + X[, 2]^2 / 3 - (1 - X[, 3]) * (1 + X[, 3]) / 2)
data <- cbind(y = y, as.data.frame(X))

## data for out-of-sample prediction
newX <- data.frame(x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10))

## linear regression model for index
model <- dindexm(
  formula = y ~ poly(x1, degree = 2) + poly(x2, degree = 2) + 
    poly(x3, degree = 2),
  indexfit = lm,
  response = "y",
  data = data
)
pred <- predict(model, data = newX)

## plot
plot(pred, 1, main = "LM based DIM")
grd <- pred[[1]]$points
trueCdf <- pnorm(
  grd,
  1 - newX[1, 1] + newX[1, 2]^2 / 3 - (1 - newX[1, 3]) * (1 + newX[1, 3]) / 2
)
points(grd, trueCdf, type = "l", col = 2)

[Package isodistrreg version 0.1.0 Index]