HPLB {HPLB} | R Documentation |
High Probability Lower Bounds (HPLB) for the Total Variation Distance (TV) Based on Finite Samples
Description
Implementations of different HPLBs for TV as described in (Michel et al., 2020).
Usage
HPLB(
t,
rho,
s = 0.5,
estimator.type = "adapt",
alpha = 0.05,
tv.seq = seq(from = 0, to = 1, by = 1/length(t)),
custom.bounding.seq = NULL,
direction = rep("left", length(s)),
cutoff = 0.5,
verbose.plot = FALSE,
seed = 0,
...
)
Arguments
t |
a numeric vector value corresponding to a natural ordering of the observations. For a two-sample test 0-1 numeric values values should be provided. |
rho |
a numeric vector value providing an ordering. This could be a binary classifier, a regressor, a witness function from a MMD kernel or anything else that would witness a distributional difference. |
s |
a numeric vector value giving split points on t. |
estimator.type |
a character value indicating which estimator to use. One option out of:
|
alpha |
a numeric value giving the overall type-I error control level. |
tv.seq |
a sequence of values between 0 and 1 used as the grid search for the total variation distance in case of tv-search. |
custom.bounding.seq |
a list of bounding functions respecting the order of tv.seq used in case of estimator.type "custom-tv-search". |
direction |
a character vector value made of "left" or "right" giving which distribution witness count to estimate (t<=s or t>s?). |
cutoff |
a numeric value. This is the cutoff used if bayes estimators are used. The theory suggests to use 1/2 but this can be changed. |
verbose.plot |
a boolean value for additional plots. |
seed |
an integer value. The seed for reproducibility. |
... |
additional parameters for the function |
Value
a list
containing the relevant lower bounds estimates. For the total variation distance the relevant entry is tvhat
.
Author(s)
Loris Michel, Jeffrey Naef
References
L. Michel, J. Naef and N. Meinshausen (2020). High-Probability Lower Bounds for the Total Variation Distance
Examples
## libs
library(HPLB)
library(ranger)
library(distrEx)
## reproducibility
set.seed(0)
## Example 1: TV lower bound based on two samples (bayes estimator), Gaussian mean-shift example
n <- 100
means <- rep(c(0,2), each = n / 2)
x <- stats::rnorm(n, mean = means)
t <- rep(c(0,1), each = n / 2)
bayesRate <- function(x) {
return(stats::dnorm(x, mean = 2) /
(stats::dnorm(x, mean = 2) + stats::dnorm(x, mean = 0)))
}
# estimated HPLB
tvhat <- HPLB(t = t, rho = bayesRate(x), estimator.type = "bayes")
# true TV
TotalVarDist(e1 = Norm(2,1), e2 = Norm(0,1))
## Example 2: optimal mixture detection (adapt estimator), Gaussian mean-shift example
n <- 100
mean.shift <- 2
t.train <- runif(n, 0 ,1)
x.train <- ifelse(t.train>0.5, stats::rnorm(n, mean.shift), stats::rnorm(n))
rf <- ranger::ranger(t~x, data.frame(t=t.train,x=x.train))
n <- 100
t.test <- runif(n, 0 ,1)
x.test <- ifelse(t.test>0.5, stats::rnorm(n, mean.shift), stats::rnorm(n))
rho <- predict(rf, data.frame(t=t.test,x=x.test))$predictions
## out-of-sample
tv.oos <- HPLB(t = t.test, rho = rho, s = seq(0.1,0.9,0.1), estimator.type = "adapt")
## total variation values
tv <- c()
for (s in seq(0.1,0.9,0.1)) {
if (s<=0.5) {
D.left <- Norm(0,1)
} else {
D.left <- UnivarMixingDistribution(Dlist = list(Norm(0,1),Norm(mean.shift,1)),
mixCoeff = c(ifelse(s<=0.5, 1, 0.5/s), ifelse(s<=0.5, 0, (s-0.5)/s)))
}
if (s < 0.5) {
D.right <- UnivarMixingDistribution(Dlist = list(Norm(0,1),Norm(mean.shift,1)),
mixCoeff = c(ifelse(s<=0.5, (0.5-s)/(1-s), 0), ifelse(s<=0.5, (0.5/(1-s)), 1)))
} else {
D.right <- Norm(mean.shift,1)
}
tv <- c(tv, TotalVarDist(e1 = D.left, e2 = D.right))
}
## plot
oldpar <- par(no.readonly =TRUE)
par(mfrow=c(2,1))
plot(t.test,x.test,pch=19,xlab="t",ylab="x")
plot(seq(0.1,0.9,0.1), tv.oos$tvhat,type="l",ylim=c(0,1),xlab="t", ylab="TV")
lines(seq(0.1,0.9,0.1), tv, col="red",type="l")
par(oldpar)