dindexm {isodistrreg}R Documentation

Distributional index model (DIM)

Description

Fits distributional index model with user-specified index function to training dataset. See the examples at the bottom to learn how to specify a distributional single index model.

Usage

dindexm(
  formula,
  indexfit,
  data,
  response,
  pars = osqpSettings(verbose = FALSE, eps_abs = 1e-05, eps_rel = 1e-05, max_iter =
    10000L),
  progress = TRUE,
  ...
)

Arguments

formula

object of class formula that describes the index model

indexfit

function that fits the index model to training data. Should accept arguments formula and data and admit a predict method. Further arguments in ... are passed to indexfit. See examples.

data

data.frame containing the covariates of the index model and the response variable.

response

name of the response variable in data.

pars

parameters for quadratic programming optimization (only relevant for multivariate index functions), set using osqpSettings.

progress

display progressbar for fitting idr?

...

further arguments passed to indexfit.

Details

This function fits a distributional index model (DIM) to training data. The DIM assumes that the response is more likely to attain higher values when the values of the index function increases. The index function can be estimated by parametric methods like lm or glm or also nonparametrically.

The formal mathematical assumption of the DIM is that the conditional CDFs F_{y | g(X) = g(x)}(z) at each fixed threshold z decreases, as g(x) increases. Here y denotes the response, x, X are the covariates in data and g is the index function estimated by indexfit.

Estimation is performed in two steps: indexfit is applied to data to estimate the function g. With this estimate, idr is applied with the pseudo-covariates g(x) and response y.

Value

Object of class dindexm: A list containing the index model (first component) and the IDR fit on the pseudo-data with the index as covariate (second component).

References

Henzi, A., Kleger, G. R., & Ziegel, J. F. (2020). Distributional (Single) Index Models. arXiv preprint arXiv:2006.09219.

See Also

idr for more information on IDR, predict.dindexfit for (out-of-sample) predictions based on a model with with dindexm.

Examples

n <- 1000
X <- data.frame(x1 = rnorm(n), x2 = rnorm(n), x3 = rnorm(n))
y <- rnorm(n, 1 - X[, 1] + X[, 2]^2 / 3 - (1 - X[, 3]) * (1 + X[, 3]) / 2)
data <- cbind(y = y, as.data.frame(X))

## data for out-of-sample prediction
newX <- data.frame(x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10))

## linear regression model for index
model <- dindexm(
  formula = y ~ poly(x1, degree = 2) + poly(x2, degree = 2) + 
    poly(x3, degree = 2),
  indexfit = lm,
  response = "y",
  data = data
)
pred <- predict(model, data = newX)

## plot
plot(pred, 1, main = "LM based DIM")
grd <- pred[[1]]$points
trueCdf <- pnorm(
  grd,
  1 - newX[1, 1] + newX[1, 2]^2 / 3 - (1 - newX[1, 3]) * (1 + newX[1, 3]) / 2
)
points(grd, trueCdf, type = "l", col = 2)

[Package isodistrreg version 0.1.0 Index]