ISS {ISS}R Documentation

ISS

Description

The function implements the combination of p-value calculation and familywise error rate control through DAG testing procedures described in Müller et al. (2023).

Usage

ISS(
  X,
  y,
  tau,
  alpha = 0.05,
  m = nrow(X),
  p_value = c("sub-Gaussian-normalmixture", "sub-Gaussian", "Gaussian", "classification",
    "quantile"),
  sigma2,
  rho = 1/2,
  FWER_control = c("ISS", "Holm", "MG all", "MG any", "split", "split oracle"),
  minimal = FALSE,
  split_proportion = 1/2,
  eta = NA,
  theta = 1/2
)

Arguments

X

a numeric matrix specifying the covariates.

y

a numeric vector with length(y) == nrow(X) specifying the responses.

tau

a single numeric value specifying the threshold of interest.

alpha

a numeric value in (0, 1] specifying the Type I error rate.

m

an integer value between 1 and nrow(X) specifying the size of the subsample of X at which the hypotheses should be tested.

p_value

one of c("sub-Gaussian", "sub-Gaussian-normalmixture", "Gaussian", "classification", "quantile") specifying which p-value construction should be used. See Definitions 1, 18, 19 and 21 and Lemma 24 by Müller et al. (2023) respectively. For p_value == "quantile", the version with the p-value from Definition 19 is implemented.

sigma2

a single positive numeric value specifying the variance parameter (only needed if p_value %in% c("sub-Gaussian", "sub-Gaussian-normalmixture")).

rho

a single positive numeric value serving as hyperparameter (only used if p_value == "sub-Gaussian-normalmixture").

FWER_control

one of c("ISS", "Holm", "MG all", "MG any", "split", "split oracle"), specifying how the familywise error rate is controlled. The first corresponds to Algorithm 1 by Müller et al. (2023), the second is Holm's procedure, the two starting with "MG" correspond to the procedures by Meijer and Goeman (2015) for one-way logical relationships, and the final two containing "split" to the sample splitting techniques in Appendix B of Müller et al. (2023).

minimal

a logical value determining whether the output should be reduced to the minimal number of points leading to the same selected set.

split_proportion

when FWER_control %in% c("split", "split oracle"), the number of data points in the first split of the data is ceiling(split_proportion * nrow(X)).

eta

when FWER_control == "split oracle", this parameter needs to be used to provide the true regression function, which should take a vector of covariates as inputs and output a single numeric value.

theta

a single numeric value in (0, 1) specifying the quantile of interest when p_value_method == "quantile". Defaults to 1/2, i.e.~the median.

Value

A numeric matrix giving the points in X determined to lie in the tau-superlevel set of the regression function with probability at least 1 - alpha or, if minimal == TRUE, a subset of points thereof that have the same upper hull.

References

Meijer RJ, Goeman JJ (2015). “A multiple testing method for hypotheses structured in a directed acyclic graph.” Biometrical Journal, 57(1), 123–143.

Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023). “Isotonic subgroup selection.” arXiv preprint arXiv:2305.04852v2.

Examples

d <- 2
n <- 1000
m <- 100
sigma2 <- (1 / 4)^2
tau <- 0.5
alpha <- 0.05

X <- matrix(runif(n * d), nrow = n)
eta_X <- apply(X, MARGIN = 1, max)
y <- eta_X + rnorm(n, sd = sqrt(sigma2))
X_rej <- ISS(X = X, y = y, tau = tau, alpha = alpha, m = m, sigma2 = sigma2)

if (d == 2) {
  plot(0, type = "n", xlim = c(0, 1), ylim = c(0, 1), xlab = NA, ylab = NA)
  for (i in 1:nrow(X_rej)) {
    rect(
      xleft = X_rej[i, 1], xright = 1, ybottom = X_rej[i, 2], ytop = 1,
      border = NA, col = "indianred"
    )
  }

  points(X, pch = 16, cex = 0.5, col = "gray")
  points(X[1:m, ], pch = 16, cex = 0.5, col = "black")
  lines(x = c(0, tau), y = c(tau, tau), lty = 2)
  lines(x = c(tau, tau), y = c(tau, 0), lty = 2)

  legend(
    x = "bottomleft",
    legend = c(
      "superlevel set boundary",
      "untested covariate points",
      "tested covariate points",
      "selected set"
    ),
    col = c("black", "gray", "black", "indianred"),
    lty = c(2, NA, NA, NA),
    lwd = c(1, NA, NA, NA),
    pch = c(NA, 16, 16, NA),
    fill = c(NA, NA, NA, "indianred"),
    border = c(NA, NA, NA, "indianred")
  )
}


[Package ISS version 1.0.0 Index]