LinearRegressionDP {DPpack}R Documentation

Privacy-preserving Linear Regression

Description

This class implements differentially private linear regression using the objective perturbation technique (Kifer et al. 2012).

Details

To use this class for linear regression, first use the new method to construct an object of this class with the desired function values and hyperparameters. After constructing the object, the fit method can be applied with a provided dataset and data bounds to fit the model. In fitting, the model stores a vector of coefficients coeff which satisfy differential privacy. These can be released directly, or used in conjunction with the predict method to privately predict the outcomes of new datapoints.

Note that in order to guarantee differential privacy for linear regression, certain constraints must be satisfied for the values used to construct the object, as well as for the data used to fit. The regularizer must be convex. Additionally, it is assumed that if x represents a single row of the dataset X, then the l2-norm of x is at most p for all x, where p is the number of predictors (including any possible intercept term). In order to ensure this constraint is satisfied, the dataset is preprocessed and scaled, and the resulting coefficients are postprocessed and un-scaled so that the stored coefficients correspond to the original data. Due to this constraint on x, it is best to avoid using an intercept term in the model whenever possible. If an intercept term must be used, the issue can be partially circumvented by adding a constant column to X before fitting the model, which will be scaled along with the rest of X. The fit method contains functionality to add a column of constant 1s to X before scaling, if desired.

Super class

DPpack::EmpiricalRiskMinimizationDP.KST -> LinearRegressionDP

Methods

Public methods

Inherited methods

Method new()

Create a new LinearRegressionDP object.

Usage
LinearRegressionDP$new(regularizer, eps, delta, gamma, regularizer.gr = NULL)
Arguments
regularizer

String or regularization function. If a string, must be 'l2', indicating to use l2 regularization. If a function, must have form regularizer(coeff), where coeff is a vector or matrix, and return the value of the regularizer at coeff. See regularizer.l2 for an example. Additionally, in order to ensure differential privacy, the function must be convex.

eps

Positive real number defining the epsilon privacy budget. If set to Inf, runs algorithm without differential privacy.

delta

Nonnegative real number defining the delta privacy parameter. If 0, reduces to pure eps-DP.

gamma

Nonnegative real number representing the regularization constant.

regularizer.gr

Optional function representing the gradient of the regularization function with respect to coeff and of the form regularizer.gr(coeff). Should return a vector. See regularizer.gr.l2 for an example. If regularizer is given as a string, this value is ignored. If not given and regularizer is a function, non-gradient based optimization methods are used to compute the coefficient values in fitting the model.

Returns

A new LinearRegressionDP object.


Method fit()

Fit the differentially private linear regression model. The function runs the objective perturbation algorithm (Kifer et al. 2012) to generate an objective function. A numerical optimization method is then run to find optimal coefficients for fitting the model given the training data and hyperparameters. The nloptr function is used. If regularizer is given as 'l2' or if regularizer.gr is given in the construction of the object, the gradient of the objective function and the Jacobian of the constraint function are utilized for the algorithm, and the NLOPT_LD_MMA method is used. If this is not the case, the NLOPT_LN_COBYLA method is used. The resulting privacy-preserving coefficients are stored in coeff.

Usage
LinearRegressionDP$fit(X, y, upper.bounds, lower.bounds, add.bias = FALSE)
Arguments
X

Dataframe of data to be fit.

y

Vector or matrix of true values for each row of X.

upper.bounds

Numeric vector of length ncol(X)+1 giving upper bounds on the values in each column of X and the values of y. The last value in the vector is assumed to be the upper bound on y, while the first ncol(X) values are assumed to be in the same order as the corresponding columns of X. Any value in the columns of X and in y larger than the corresponding upper bound is clipped at the bound.

lower.bounds

Numeric vector of length ncol(X)+1 giving lower bounds on the values in each column of X and the values of y. The last value in the vector is assumed to be the lower bound on y, while the first ncol(X) values are assumed to be in the same order as the corresponding columns of X. Any value in the columns of X and in y larger than the corresponding lower bound is clipped at the bound.

add.bias

Boolean indicating whether to add a bias term to X. Defaults to FALSE.


Method clone()

The objects of this class are cloneable with this method.

Usage
LinearRegressionDP$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

Kifer D, Smith A, Thakurta A (2012). “Private Convex Empirical Risk Minimization and High-dimensional Regression.” In Mannor S, Srebro N, Williamson RC (eds.), Proceedings of the 25th Annual Conference on Learning Theory, volume 23 of Proceedings of Machine Learning Research, 25.1–25.40. https://proceedings.mlr.press/v23/kifer12.html.

Examples

# Build example dataset
n <- 500
X <- data.frame(X=seq(-1,1,length.out = n))
true.theta <- c(-.3,.5) # First element is bias term
p <- length(true.theta)
y <- true.theta[1] + as.matrix(X)%*%true.theta[2:p] + stats::rnorm(n=n,sd=.1)

# Construct object for linear regression
regularizer <- 'l2' # Alternatively, function(coeff) coeff%*%coeff/2
eps <- 1
delta <- 0 # Indicates to use pure eps-DP
gamma <- 1
regularizer.gr <- function(coeff) coeff

lrdp <- LinearRegressionDP$new('l2', eps, delta, gamma, regularizer.gr)

# Fit with data
# We must assume y is a matrix with values between -p and p (-2 and 2
#   for this example)
upper.bounds <- c(1, 2) # Bounds for X and y
lower.bounds <- c(-1,-2) # Bounds for X and y
lrdp$fit(X, y, upper.bounds, lower.bounds, add.bias=TRUE)
lrdp$coeff # Gets private coefficients

# Predict new data points
# Build a test dataset
Xtest <- data.frame(X=c(-.5, -.25, .1, .4))
predicted.y <- lrdp$predict(Xtest, add.bias=TRUE)


[Package DPpack version 0.1.0 Index]