tune_classification_model {DPpack}  R Documentation 
This function implements the privacypreserving hyperparameter tuning
function for binary classification (Chaudhuri et al. 2011) using
the exponential mechanism. It accepts a list of models with various chosen
hyperparameters, a dataset X with corresponding labels y, upper and lower
bounds on the columns of X, and a boolean indicating whether to add bias in
the construction of each of the models. The data are split into m+1 equal
groups, where m is the number of models being compared. One group is set
aside as the validation group, and each of the other m groups are used to
train each of the given m models. The number of errors on the validation set
is counted for each model and used as the utility values in the exponential
mechanism (ExponentialMechanism
) to select a tuned model in a
privacypreserving way.
tune_classification_model(
models,
X,
y,
upper.bounds,
lower.bounds,
add.bias = FALSE,
weights = NULL,
weights.upper.bound = NULL
)
models 
Vector of binary classification model objects, each initialized with a different combination of hyperparameter values from the search space for tuning. Each model should be initialized with the same epsilon privacy parameter value eps. The tuned model satisfies epslevel differential privacy. 
X 
Dataframe of data to be used in tuning the model. Note it is assumed the data rows and corresponding labels are randomly shuffled. 
y 
Vector or matrix of true labels for each row of X. 
upper.bounds 
Numeric vector giving upper bounds on the values in each column of X. Should be of length ncol(X). The values are assumed to be in the same order as the corresponding columns of X. Any value in the columns of X larger than the corresponding upper bound is clipped at the bound. 
lower.bounds 
Numeric vector giving lower bounds on the values in each column of X. Should be of length ncol(X). The values are assumed to be in the same order as the corresponding columns of X. Any value in the columns of X smaller than the corresponding lower bound is clipped at the bound. 
add.bias 
Boolean indicating whether to add a bias term to X. Defaults to FALSE. 
weights 
Numeric vector of observation weights of the same length as

weights.upper.bound 
Numeric value representing the global or public upper bound on the weights. 
Single model object selected from the input list models with tuned parameters.
Chaudhuri K, Monteleoni C, Sarwate AD (2011). “Differentially Private Empirical Risk Minimization.” Journal of Machine Learning Research, 12(29), 10691109. https://jmlr.org/papers/v12/chaudhuri11a.html.
# Build train dataset X and y, and test dataset Xtest and ytest
N < 200
K < 2
X < data.frame()
y < data.frame()
for (j in (1:K)){
t < seq(.25,.25,length.out = N)
if (j==1) m < stats::rnorm(N,.2,.1)
if (j==2) m < stats::rnorm(N, .2,.1)
Xtemp < data.frame(x1 = 3*t , x2 = m  t)
ytemp < data.frame(matrix(j1, N, 1))
X < rbind(X, Xtemp)
y < rbind(y, ytemp)
}
Xtest < X[seq(1,(N*K),10),]
ytest < y[seq(1,(N*K),10),,drop=FALSE]
X < X[seq(1,(N*K),10),]
y < y[seq(1,(N*K),10),,drop=FALSE]
y < as.matrix(y)
weights < rep(1, nrow(y)) # Uniform weighting
weights[nrow(y)] < 0.5 # half weight for last observation
wub < 1 # Public upper bound for weights
# Grid of possible gamma values for tuning logistic regression model
grid.search < c(100, 1, .0001)
# Construct objects for SVM parameter tuning
eps < 1 # Privacy budget should be the same for all models
svmdp1 < svmDP$new("l2", eps, grid.search[1], perturbation.method='output')
svmdp2 < svmDP$new("l2", eps, grid.search[2], perturbation.method='output')
svmdp3 < svmDP$new("l2", eps, grid.search[3], perturbation.method='output')
models < c(svmdp1, svmdp2, svmdp3)
# Tune using data and bounds for X based on its construction
upper.bounds < c( 1, 1)
lower.bounds < c(1,1)
tuned.model < tune_classification_model(models, X, y, upper.bounds,
lower.bounds, weights=weights,
weights.upper.bound=wub)
tuned.model$gamma # Gives resulting selected hyperparameter
# tuned.model result can be used the same as a trained LogisticRegressionDP model
# Predict new data points
predicted.y < tuned.model$predict(Xtest)
n.errors < sum(predicted.y!=ytest)