LinearRegressionDP {DPpack} | R Documentation |

This class implements differentially private linear regression using the objective perturbation technique (Kifer et al. 2012).

To use this class for linear regression, first use the `new`

method to construct an object of this class with the desired function
values and hyperparameters. After constructing the object, the `fit`

method can be applied with a provided dataset and data bounds to fit the
model. In fitting, the model stores a vector of coefficients `coeff`

which satisfy differential privacy. These can be released directly, or used
in conjunction with the `predict`

method to privately predict the
outcomes of new datapoints.

Note that in order to guarantee differential privacy for linear regression,
certain constraints must be satisfied for the values used to construct the
object, as well as for the data used to fit. The regularizer must be
convex. Additionally, it is assumed that if x represents a single row of
the dataset X, then the l2-norm of x is at most p for all x, where p is the
number of predictors (including any possible intercept term). In order to
ensure this constraint is satisfied, the dataset is preprocessed and
scaled, and the resulting coefficients are postprocessed and un-scaled so
that the stored coefficients correspond to the original data. Due to this
constraint on x, it is best to avoid using an intercept term in the model
whenever possible. If an intercept term must be used, the issue can be
partially circumvented by adding a constant column to X before fitting the
model, which will be scaled along with the rest of X. The `fit`

method
contains functionality to add a column of constant 1s to X before scaling,
if desired.

`DPpack::EmpiricalRiskMinimizationDP.KST`

-> `LinearRegressionDP`

`new()`

Create a new LinearRegressionDP object.

LinearRegressionDP$new(regularizer, eps, delta, gamma, regularizer.gr = NULL)

`regularizer`

String or regularization function. If a string, must be 'l2', indicating to use l2 regularization. If a function, must have form

`regularizer(coeff)`

, where`coeff`

is a vector or matrix, and return the value of the regularizer at`coeff`

. See`regularizer.l2`

for an example. Additionally, in order to ensure differential privacy, the function must be convex.`eps`

Positive real number defining the epsilon privacy budget. If set to Inf, runs algorithm without differential privacy.

`delta`

Nonnegative real number defining the delta privacy parameter. If 0, reduces to pure eps-DP.

`gamma`

Nonnegative real number representing the regularization constant.

`regularizer.gr`

Optional function representing the gradient of the regularization function with respect to

`coeff`

and of the form`regularizer.gr(coeff)`

. Should return a vector. See`regularizer.gr.l2`

for an example. If`regularizer`

is given as a string, this value is ignored. If not given and`regularizer`

is a function, non-gradient based optimization methods are used to compute the coefficient values in fitting the model.

A new LinearRegressionDP object.

`fit()`

Fit the differentially private linear regression model. The
function runs the objective perturbation algorithm
(Kifer et al. 2012) to generate an objective function. A
numerical optimization method is then run to find optimal coefficients
for fitting the model given the training data and hyperparameters. The
`nloptr`

function is used. If `regularizer`

is given as
'l2' or if `regularizer.gr`

is given in the construction of the
object, the gradient of the objective function and the Jacobian of the
constraint function are utilized for the algorithm, and the NLOPT_LD_MMA
method is used. If this is not the case, the NLOPT_LN_COBYLA method is
used. The resulting privacy-preserving coefficients are stored in coeff.

LinearRegressionDP$fit(X, y, upper.bounds, lower.bounds, add.bias = FALSE)

`X`

Dataframe of data to be fit.

`y`

Vector or matrix of true values for each row of

`X`

.`upper.bounds`

Numeric vector of length

`ncol(X)+1`

giving upper bounds on the values in each column of`X`

and the values of`y`

. The last value in the vector is assumed to be the upper bound on`y`

, while the first`ncol(X)`

values are assumed to be in the same order as the corresponding columns of`X`

. Any value in the columns of`X`

and in`y`

larger than the corresponding upper bound is clipped at the bound.`lower.bounds`

Numeric vector of length

`ncol(X)+1`

giving lower bounds on the values in each column of`X`

and the values of`y`

. The last value in the vector is assumed to be the lower bound on`y`

, while the first`ncol(X)`

values are assumed to be in the same order as the corresponding columns of`X`

. Any value in the columns of`X`

and in`y`

larger than the corresponding lower bound is clipped at the bound.`add.bias`

Boolean indicating whether to add a bias term to

`X`

. Defaults to FALSE.

`clone()`

The objects of this class are cloneable with this method.

LinearRegressionDP$clone(deep = FALSE)

`deep`

Whether to make a deep clone.

Kifer D, Smith A, Thakurta A (2012).
“Private Convex Empirical Risk Minimization and High-dimensional Regression.”
In Mannor S, Srebro N, Williamson RC (eds.), *Proceedings of the 25th Annual Conference on Learning Theory*, volume 23 of *Proceedings of Machine Learning Research*, 25.1–25.40.
https://proceedings.mlr.press/v23/kifer12.html.

```
# Build example dataset
n <- 500
X <- data.frame(X=seq(-1,1,length.out = n))
true.theta <- c(-.3,.5) # First element is bias term
p <- length(true.theta)
y <- true.theta[1] + as.matrix(X)%*%true.theta[2:p] + stats::rnorm(n=n,sd=.1)
# Construct object for linear regression
regularizer <- 'l2' # Alternatively, function(coeff) coeff%*%coeff/2
eps <- 1
delta <- 0 # Indicates to use pure eps-DP
gamma <- 1
regularizer.gr <- function(coeff) coeff
lrdp <- LinearRegressionDP$new('l2', eps, delta, gamma, regularizer.gr)
# Fit with data
# We must assume y is a matrix with values between -p and p (-2 and 2
# for this example)
upper.bounds <- c(1, 2) # Bounds for X and y
lower.bounds <- c(-1,-2) # Bounds for X and y
lrdp$fit(X, y, upper.bounds, lower.bounds, add.bias=TRUE)
lrdp$coeff # Gets private coefficients
# Predict new data points
# Build a test dataset
Xtest <- data.frame(X=c(-.5, -.25, .1, .4))
predicted.y <- lrdp$predict(Xtest, add.bias=TRUE)
```

[Package *DPpack* version 0.1.0 Index]