tune_classification_model {DPpack} | R Documentation |
Privacy-preserving Hyperparameter Tuning for Binary Classification Models
Description
This function implements the privacy-preserving hyperparameter tuning
function for binary classification (Chaudhuri et al. 2011) using
the exponential mechanism. It accepts a list of models with various chosen
hyperparameters, a dataset X with corresponding labels y, upper and lower
bounds on the columns of X, and a boolean indicating whether to add bias in
the construction of each of the models. The data are split into m+1 equal
groups, where m is the number of models being compared. One group is set
aside as the validation group, and each of the other m groups are used to
train each of the given m models. The number of errors on the validation set
is counted for each model and used as the utility values in the exponential
mechanism (ExponentialMechanism
) to select a tuned model in a
privacy-preserving way.
Usage
tune_classification_model(
models,
X,
y,
upper.bounds,
lower.bounds,
add.bias = FALSE,
weights = NULL,
weights.upper.bound = NULL
)
Arguments
models |
Vector of binary classification model objects, each initialized with a different combination of hyperparameter values from the search space for tuning. Each model should be initialized with the same epsilon privacy parameter value eps. The tuned model satisfies eps-level differential privacy. |
X |
Dataframe of data to be used in tuning the model. Note it is assumed the data rows and corresponding labels are randomly shuffled. |
y |
Vector or matrix of true labels for each row of X. |
upper.bounds |
Numeric vector giving upper bounds on the values in each column of X. Should be of length ncol(X). The values are assumed to be in the same order as the corresponding columns of X. Any value in the columns of X larger than the corresponding upper bound is clipped at the bound. |
lower.bounds |
Numeric vector giving lower bounds on the values in each column of X. Should be of length ncol(X). The values are assumed to be in the same order as the corresponding columns of X. Any value in the columns of X smaller than the corresponding lower bound is clipped at the bound. |
add.bias |
Boolean indicating whether to add a bias term to X. Defaults to FALSE. |
weights |
Numeric vector of observation weights of the same length as
|
weights.upper.bound |
Numeric value representing the global or public upper bound on the weights. |
Value
Single model object selected from the input list models with tuned parameters.
References
Chaudhuri K, Monteleoni C, Sarwate AD (2011). “Differentially Private Empirical Risk Minimization.” Journal of Machine Learning Research, 12(29), 1069-1109. https://jmlr.org/papers/v12/chaudhuri11a.html.
Examples
# Build train dataset X and y, and test dataset Xtest and ytest
N <- 200
K <- 2
X <- data.frame()
y <- data.frame()
for (j in (1:K)){
t <- seq(-.25,.25,length.out = N)
if (j==1) m <- stats::rnorm(N,-.2,.1)
if (j==2) m <- stats::rnorm(N, .2,.1)
Xtemp <- data.frame(x1 = 3*t , x2 = m - t)
ytemp <- data.frame(matrix(j-1, N, 1))
X <- rbind(X, Xtemp)
y <- rbind(y, ytemp)
}
Xtest <- X[seq(1,(N*K),10),]
ytest <- y[seq(1,(N*K),10),,drop=FALSE]
X <- X[-seq(1,(N*K),10),]
y <- y[-seq(1,(N*K),10),,drop=FALSE]
y <- as.matrix(y)
weights <- rep(1, nrow(y)) # Uniform weighting
weights[nrow(y)] <- 0.5 # half weight for last observation
wub <- 1 # Public upper bound for weights
# Grid of possible gamma values for tuning logistic regression model
grid.search <- c(100, 1, .0001)
# Construct objects for SVM parameter tuning
eps <- 1 # Privacy budget should be the same for all models
svmdp1 <- svmDP$new("l2", eps, grid.search[1], perturbation.method='output')
svmdp2 <- svmDP$new("l2", eps, grid.search[2], perturbation.method='output')
svmdp3 <- svmDP$new("l2", eps, grid.search[3], perturbation.method='output')
models <- c(svmdp1, svmdp2, svmdp3)
# Tune using data and bounds for X based on its construction
upper.bounds <- c( 1, 1)
lower.bounds <- c(-1,-1)
tuned.model <- tune_classification_model(models, X, y, upper.bounds,
lower.bounds, weights=weights,
weights.upper.bound=wub)
tuned.model$gamma # Gives resulting selected hyperparameter
# tuned.model result can be used the same as a trained LogisticRegressionDP model
# Predict new data points
predicted.y <- tuned.model$predict(Xtest)
n.errors <- sum(predicted.y!=ytest)