ci_mincost {OptHoldoutSize} | R Documentation |
Confidence interval for minimum total cost, when estimated using parametric method
Description
Compute confidence interval for cost at optimal holdout size given either a standard error covariance matrix or a set of m estimates of parameters.
This can be done either asymptotically, using a method analogous to the Fisher information matrix, or empirically (using bootstrap resampling)
If sigma (covariance matrix) is specified and method='bootstrap', a confidence interval is generated assuming a Gaussian distribution of (N,k1,theta). To estimate a confidence interval assuming a non-Gaussian distribution, simulate values under the requisite distribution and use then as parameters N,k1, theta, with sigma set to NULL.
Usage
ci_mincost(
N,
k1,
theta,
alpha = 0.05,
k2form = powerlaw,
grad_mincost = NULL,
sigma = NULL,
n_boot = 1000,
seed = NULL,
mode = "empirical",
...
)
Arguments
N |
Vector of estimates of total number of samples on which the predictive score will be used/fitted, or single estimate |
k1 |
Vector of estimates of cost value in the absence of a predictive score, or single number |
theta |
Matrix of estimates of parameters for function k2form(n) governing expected cost to an individual sample given a predictive score fitted to n samples. Can be a matrix of dimension n x n_par, where n_par is the number of parameters of k2. |
alpha |
Construct 1-alpha confidence interval. Defaults to 0.05 |
k2form |
Function governing expected cost to an individual sample given a predictive score fitted to n samples. Must take two arguments: n (number of samples) and theta (parameters). Defaults to a power-law form |
grad_mincost |
Function giving partial derivatives of minimum cost, taking three arguments: N, k1, and theta. Only used for asymptotic confidence intervals. F NULL, estimated empirically |
sigma |
Standard error covariance matrix for (N,k1,theta), in that order. If NULL, will derive as sample covariance matrix of parameters. Must be of the correct size and positive definite. |
n_boot |
Number of bootstrap resamples for empirical estimate. |
seed |
Random seed for bootstrap resamples. Defaults to NULL. |
mode |
One of 'asymptotic' or 'empirical'. Defaults to 'empirical' |
... |
Passed to function |
Value
A vector of length two containing lower and upper limits of confidence interval.
Examples
## Set seed
set.seed(574635)
## We will assume that our observations of N, k1, and theta=(a,b,c) are
## distributed with mean mu_par and variance sigma_par
mu_par=c(N=10000,k1=0.35,A=3,B=1.5,C=0.1)
sigma_par=cbind(
c(100^2, 1, 0, 0, 0),
c( 1, 0.07^2, 0, 0, 0),
c( 0, 0, 0.5^2, 0.05, -0.001),
c( 0, 0, 0.05, 0.4^2, -0.002),
c( 0, 0, -0.001, -0.002, 0.02^2)
)
# Firstly, we make 500 observations
par_obs=rmnorm(500,mean=mu_par,varcov=sigma_par)
# Minimum cost and asymptotic and empirical confidence intervals
mincost=optimal_holdout_size(N=mean(par_obs[,1]),k1=mean(par_obs[,2]),
theta=colMeans(par_obs[,3:5]))$cost
ci_a=ci_mincost(N=par_obs[,1],k1=par_obs[,2],theta=par_obs[,3:5],alpha=0.05,
seed=12345,mode="asymptotic")
ci_e=ci_mincost(N=par_obs[,1],k1=par_obs[,2],theta=par_obs[,3:5],alpha=0.05,
seed=12345,mode="empirical")
# Assess cover at various m
m_values=c(20,30,50,100,150,200,300,500,750,1000,1500)
ntrial=5000
alpha_trial=0.1 # use 90% confidence intervals
mincost_true=optimal_holdout_size(N=mu_par[1],k1=mu_par[2],
theta=mu_par[3:5])$cost
## The matrices indicating cover take are included in this package but take
## around 30 minutes to generate. They are generated using the code below
## (including random seeds).
data(ci_cover_cost_a_yn)
data(ci_cover_cost_e_yn)
if (!exists("ci_cover_cost_a_yn")) {
ci_cover_cost_a_yn=matrix(NA,length(m_values),ntrial) # Entry [i,j] is 1
# if ith asymptotic CI for jth value of m covers true mincost
ci_cover_cost_e_yn=matrix(NA,length(m_values),ntrial) # Entry [i,j] is 1
# if ith empirical CI for jth value of m covers true mincost
for (i in 1:length(m_values)) {
m=m_values[i]
for (j in 1:ntrial) {
# Set seed
set.seed(j*ntrial + i + 12345)
# Make m observations
par_obs=rmnorm(m,mean=mu_par,varcov=sigma_par)
ci_a=ci_mincost(N=par_obs[,1],k1=par_obs[,2],theta=par_obs[,3:5],
alpha=alpha_trial,mode="asymptotic")
ci_e=ci_mincost(N=par_obs[,1],k1=par_obs[,2],theta=par_obs[,3:5],
alpha=alpha_trial,mode="empirical",n_boot=500)
if (mincost_true>ci_a[1] & mincost_true<ci_a[2])
ci_cover_cost_a_yn[i,j]=1 else ci_cover_cost_a_yn[i,j]=0
if (mincost_true>ci_e[1] & mincost_true<ci_e[2])
ci_cover_cost_e_yn[i,j]=1 else ci_cover_cost_e_yn[i,j]=0
}
print(paste0("Completed for m = ",m))
}
save(ci_cover_cost_a_yn,file="data/ci_cover_cost_a_yn.RData")
save(ci_cover_cost_e_yn,file="data/ci_cover_cost_e_yn.RData")
}
# Cover at each m value and standard error
cover_a=rowMeans(ci_cover_cost_a_yn)
cover_e=rowMeans(ci_cover_cost_e_yn)
zse_a=2*sqrt(cover_a*(1-cover_a)/ntrial)
zse_e=2*sqrt(cover_e*(1-cover_e)/ntrial)
# Draw plot. Convergence to 1-alpha cover is evident. Cover is not far from
# alpha even at small m.
plot(0,type="n",xlim=range(m_values),ylim=c(0.7,1),xlab=expression("m"),
ylab="Cover")
# Asymptotic cover and 2*SE pointwise envelope
polygon(c(m_values,rev(m_values)),c(cover_a+zse_a,rev(cover_a-zse_a)),
col=rgb(0,0,0,alpha=0.3),border=NA)
lines(m_values,cover_a,col="black")
# Empirical cover and 2*SE pointwiseenvelope
polygon(c(m_values,rev(m_values)),c(cover_e+zse_e,rev(cover_e-zse_e)),
col=rgb(0,0,1,alpha=0.3),border=NA)
lines(m_values,cover_e,col="blue")
abline(h=1-alpha_trial,col="red")
legend("bottomright",c("Asym.","Emp.",expression(paste("1-",alpha))),lty=1,
col=c("black","blue","red"))