EmpiricalRiskMinimizationDP.KST {DPpack} R Documentation

## Privacy-preserving Empirical Risk Minimization for Regression

### Description

This class implements differentially private empirical risk minimization using the objective perturbation technique (Kifer et al. 2012). It is intended to be a framework for building more specific models via inheritance. See LinearRegressionDP for an example of this type of structure.

### Details

To use this class for empirical risk minimization, first use the new method to construct an object of this class with the desired function values and hyperparameters. After constructing the object, the fit method can be applied with a provided dataset and data bounds to fit the model. In fitting, the model stores a vector of coefficients coeff which satisfy differential privacy. These can be released directly, or used in conjunction with the predict method to privately predict the outcomes of new datapoints.

Note that in order to guarantee differential privacy for the empirical risk minimization model, certain constraints must be satisfied for the values used to construct the object, as well as for the data used to fit. Specifically, the following constraints must be met. Let l represent the loss function for an individual dataset row x and output value y and L represent the average loss over all rows and output values. First, L must be convex with a continuous Hessian. Second, the l2-norm of the gradient of l must be bounded above by some constant zeta for all possible input values in the domain. Third, for all possible inputs to l, the Hessian of l must be of rank at most one and its Eigenvalues must be bounded above by some constant lambda. Fourth, the regularizer must be convex. Finally, the provided domain of l must be a closed convex subset of the set of all real-valued vectors of dimension p, where p is the number of columns of X. Note that because of this, a bias term cannot be included without appropriate scaling/preprocessing of the dataset. To ensure privacy, the add.bias argument in the fit and predict methods should only be utilized in subclasses within this package where appropriate preprocessing is implemented, not in this class.

### Public fields

mapXy

Map function of the form mapXy(X, coeff) mapping input data matrix X and coefficient vector or matrix coeff to output labels y.

mapXy.gr

Function representing the gradient of the map function with respect to the values in coeff and of the form mapXy.gr(X, coeff), where X is a matrix and coeff is a matrix or numeric vector.

loss

Loss function of the form loss(y.hat, y), where y.hat and y are matrices.

loss.gr

Function representing the gradient of the loss function with respect to y.hat and of the form loss.gr(y.hat, y), where y.hat and y are matrices.

regularizer

Regularization function of the form regularizer(coeff), where coeff is a vector or matrix.

regularizer.gr

Function representing the gradient of the regularization function with respect to coeff and of the form regularizer.gr(coeff).

gamma

Nonnegative real number representing the regularization constant.

eps

Positive real number defining the epsilon privacy budget. If set to Inf, runs algorithm without differential privacy.

delta

Nonnegative real number defining the delta privacy parameter. If 0, reduces to pure eps-DP.

domain

List of constraint and jacobian functions representing the constraints on the search space for the objective perturbation algorithm used in Kifer et al. (2012).

zeta

Positive real number denoting the upper bound on the l2-norm value of the gradient of the loss function, as required to ensure differential privacy.

lambda

Positive real number corresponding to the upper bound of the Eigenvalues of the Hessian of the loss function for all possible inputs.

coeff

Numeric vector of coefficients for the model.

### Methods

#### Method new()

Create a new EmpiricalRiskMinimizationDP.KST object.

X,
y,
upper.bounds,
lower.bounds,
)
##### Arguments
X

Dataframe of data to be fit.

y

Vector or matrix of true values for each row of X.

upper.bounds

Numeric vector of length ncol(X)+1 giving upper bounds on the values in each column of X and the values of y. The last value in the vector is assumed to be the upper bound on y, while the first ncol(X) values are assumed to be in the same order as the corresponding columns of X. Any value in the columns of X and in y larger than the corresponding upper bound is clipped at the bound.

lower.bounds

Numeric vector of length ncol(X)+1 giving lower bounds on the values in each column of X and the values of y. The last value in the vector is assumed to be the lower bound on y, while the first ncol(X) values are assumed to be in the same order as the corresponding columns of X. Any value in the columns of X and in y larger than the corresponding lower bound is clipped at the bound.

add.bias

Boolean indicating whether to add a bias term to X. Defaults to FALSE.

#### Method predict()

Predict y values for given X using the fitted coefficients.

##### Arguments
deep

Whether to make a deep clone.

### References

Kifer D, Smith A, Thakurta A (2012). “Private Convex Empirical Risk Minimization and High-dimensional Regression.” In Mannor S, Srebro N, Williamson RC (eds.), Proceedings of the 25th Annual Conference on Learning Theory, volume 23 of Proceedings of Machine Learning Research, 25.1–25.40. https://proceedings.mlr.press/v23/kifer12.html.

### Examples

# Build example dataset
n <- 500
X <- data.frame(X=seq(-1,1,length.out = n))
true.theta <- c(-.3,.5) # First element is bias term
p <- length(true.theta)
y <- true.theta[1] + as.matrix(X)%*%true.theta[2:p] + stats::rnorm(n=n,sd=.1)

# Construct object for linear regression
mapXy <- function(X, coeff) X%*%coeff
loss <- function(y.hat, y) (y.hat-y)^2/2
regularizer <- 'l2' # Alternatively, function(coeff) coeff%*%coeff/2
eps <- 1
delta <- 1
domain <- list("constraints"=function(coeff) coeff%*%coeff-length(coeff),
"jacobian"=function(coeff) 2*coeff)
# Set p to be the number of predictors desired including intercept term (length of coeff)
zeta <- 2*p^(3/2) # Proper bound for linear regression
lambda <- p # Proper bound for linear regression
gamma <- 1
mapXy.gr <- function(X, coeff) t(X)
loss.gr <- function(y.hat, y) y.hat-y
regularizer.gr <- function(coeff) coeff

ermdp <- EmpiricalRiskMinimizationDP.KST$new(mapXy, loss, 'l2', eps, delta, domain, zeta, lambda, gamma, mapXy.gr, loss.gr, regularizer.gr) # Fit with data # We must assume y is a matrix with values between -p and p (-2 and 2 # for this example) upper.bounds <- c(1, 2) # Bounds for X and y lower.bounds <- c(-1,-2) # Bounds for X and y ermdp$fit(X, y, upper.bounds, lower.bounds, add.bias=TRUE)
ermdp$coeff # Gets private coefficients # Predict new data points # Build a test dataset Xtest <- data.frame(X=c(-.5, -.25, .1, .4)) predicted.y <- ermdp$predict(Xtest, add.bias=TRUE)



[Package DPpack version 0.0.11 Index]