LinearRegressionDP {DPpack} | R Documentation |
Privacy-preserving Linear Regression
Description
This class implements differentially private linear regression using the objective perturbation technique (Kifer et al. 2012).
Details
To use this class for linear regression, first use the new
method to construct an object of this class with the desired function
values and hyperparameters. After constructing the object, the fit
method can be applied with a provided dataset and data bounds to fit the
model. In fitting, the model stores a vector of coefficients coeff
which satisfy differential privacy. These can be released directly, or used
in conjunction with the predict
method to privately predict the
outcomes of new datapoints.
Note that in order to guarantee differential privacy for linear regression,
certain constraints must be satisfied for the values used to construct the
object, as well as for the data used to fit. The regularizer must be
convex. Additionally, it is assumed that if x represents a single row of
the dataset X, then the l2-norm of x is at most p for all x, where p is the
number of predictors (including any possible intercept term). In order to
ensure this constraint is satisfied, the dataset is preprocessed and
scaled, and the resulting coefficients are postprocessed and un-scaled so
that the stored coefficients correspond to the original data. Due to this
constraint on x, it is best to avoid using an intercept term in the model
whenever possible. If an intercept term must be used, the issue can be
partially circumvented by adding a constant column to X before fitting the
model, which will be scaled along with the rest of X. The fit
method
contains functionality to add a column of constant 1s to X before scaling,
if desired.
Super class
DPpack::EmpiricalRiskMinimizationDP.KST
-> LinearRegressionDP
Methods
Public methods
Inherited methods
Method new()
Create a new LinearRegressionDP object.
Usage
LinearRegressionDP$new(regularizer, eps, delta, gamma, regularizer.gr = NULL)
Arguments
regularizer
String or regularization function. If a string, must be 'l2', indicating to use l2 regularization. If a function, must have form
regularizer(coeff)
, wherecoeff
is a vector or matrix, and return the value of the regularizer atcoeff
. Seeregularizer.l2
for an example. Additionally, in order to ensure differential privacy, the function must be convex.eps
Positive real number defining the epsilon privacy budget. If set to Inf, runs algorithm without differential privacy.
delta
Nonnegative real number defining the delta privacy parameter. If 0, reduces to pure eps-DP.
gamma
Nonnegative real number representing the regularization constant.
regularizer.gr
Optional function representing the gradient of the regularization function with respect to
coeff
and of the formregularizer.gr(coeff)
. Should return a vector. Seeregularizer.gr.l2
for an example. Ifregularizer
is given as a string, this value is ignored. If not given andregularizer
is a function, non-gradient based optimization methods are used to compute the coefficient values in fitting the model.
Returns
A new LinearRegressionDP object.
Method fit()
Fit the differentially private linear regression model. The
function runs the objective perturbation algorithm
(Kifer et al. 2012) to generate an objective function. A
numerical optimization method is then run to find optimal coefficients
for fitting the model given the training data and hyperparameters. The
nloptr
function is used. If regularizer
is given as
'l2' or if regularizer.gr
is given in the construction of the
object, the gradient of the objective function and the Jacobian of the
constraint function are utilized for the algorithm, and the NLOPT_LD_MMA
method is used. If this is not the case, the NLOPT_LN_COBYLA method is
used. The resulting privacy-preserving coefficients are stored in coeff.
Usage
LinearRegressionDP$fit(X, y, upper.bounds, lower.bounds, add.bias = FALSE)
Arguments
X
Dataframe of data to be fit.
y
Vector or matrix of true values for each row of
X
.upper.bounds
Numeric vector of length
ncol(X)+1
giving upper bounds on the values in each column ofX
and the values ofy
. The last value in the vector is assumed to be the upper bound ony
, while the firstncol(X)
values are assumed to be in the same order as the corresponding columns ofX
. Any value in the columns ofX
and iny
larger than the corresponding upper bound is clipped at the bound.lower.bounds
Numeric vector of length
ncol(X)+1
giving lower bounds on the values in each column ofX
and the values ofy
. The last value in the vector is assumed to be the lower bound ony
, while the firstncol(X)
values are assumed to be in the same order as the corresponding columns ofX
. Any value in the columns ofX
and iny
larger than the corresponding lower bound is clipped at the bound.add.bias
Boolean indicating whether to add a bias term to
X
. Defaults to FALSE.
Method clone()
The objects of this class are cloneable with this method.
Usage
LinearRegressionDP$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
References
Kifer D, Smith A, Thakurta A (2012). “Private Convex Empirical Risk Minimization and High-dimensional Regression.” In Mannor S, Srebro N, Williamson RC (eds.), Proceedings of the 25th Annual Conference on Learning Theory, volume 23 of Proceedings of Machine Learning Research, 25.1–25.40. https://proceedings.mlr.press/v23/kifer12.html.
Examples
# Build example dataset
n <- 500
X <- data.frame(X=seq(-1,1,length.out = n))
true.theta <- c(-.3,.5) # First element is bias term
p <- length(true.theta)
y <- true.theta[1] + as.matrix(X)%*%true.theta[2:p] + stats::rnorm(n=n,sd=.1)
# Construct object for linear regression
regularizer <- 'l2' # Alternatively, function(coeff) coeff%*%coeff/2
eps <- 1
delta <- 0 # Indicates to use pure eps-DP
gamma <- 1
regularizer.gr <- function(coeff) coeff
lrdp <- LinearRegressionDP$new('l2', eps, delta, gamma, regularizer.gr)
# Fit with data
# We must assume y is a matrix with values between -p and p (-2 and 2
# for this example)
upper.bounds <- c(1, 2) # Bounds for X and y
lower.bounds <- c(-1,-2) # Bounds for X and y
lrdp$fit(X, y, upper.bounds, lower.bounds, add.bias=TRUE)
lrdp$coeff # Gets private coefficients
# Predict new data points
# Build a test dataset
Xtest <- data.frame(X=c(-.5, -.25, .1, .4))
predicted.y <- lrdp$predict(Xtest, add.bias=TRUE)