## Privacy-preserving Linear Regression

### Description

This class implements differentially private linear regression using the objective perturbation technique (Kifer et al. 2012).

### Details

To use this class for linear regression, first use the new method to construct an object of this class with the desired function values and hyperparameters. After constructing the object, the fit method can be applied with a provided dataset and data bounds to fit the model. In fitting, the model stores a vector of coefficients coeff which satisfy differential privacy. These can be released directly, or used in conjunction with the predict method to privately predict the outcomes of new datapoints.

Note that in order to guarantee differential privacy for linear regression, certain constraints must be satisfied for the values used to construct the object, as well as for the data used to fit. The regularizer must be convex. Additionally, it is assumed that if x represents a single row of the dataset X, then the l2-norm of x is at most p for all x, where p is the number of predictors (including any possible intercept term). In order to ensure this constraint is satisfied, the dataset is preprocessed and scaled, and the resulting coefficients are postprocessed and un-scaled so that the stored coefficients correspond to the original data. Due to this constraint on x, it is best to avoid using an intercept term in the model whenever possible. If an intercept term must be used, the issue can be partially circumvented by adding a constant column to X before fitting the model, which will be scaled along with the rest of X. The fit method contains functionality to add a column of constant 1s to X before scaling, if desired.

### Super class

DPpack::EmpiricalRiskMinimizationDP.KST -> LinearRegressionDP

### Methods

#### Public methods

#### Method new()

Create a new LinearRegressionDP object.

##### Arguments
X

Dataframe of data to be fit.

y

Vector or matrix of true values for each row of X.

upper.bounds

Numeric vector of length ncol(X)+1 giving upper bounds on the values in each column of X and the values of y. The last value in the vector is assumed to be the upper bound on y, while the first ncol(X) values are assumed to be in the same order as the corresponding columns of X. Any value in the columns of X and in y larger than the corresponding upper bound is clipped at the bound.

lower.bounds

Numeric vector of length ncol(X)+1 giving lower bounds on the values in each column of X and the values of y. The last value in the vector is assumed to be the lower bound on y, while the first ncol(X) values are assumed to be in the same order as the corresponding columns of X. Any value in the columns of X and in y larger than the corresponding lower bound is clipped at the bound.

add.bias

Boolean indicating whether to add a bias term to X. Defaults to FALSE.

#### Method clone()

The objects of this class are cloneable with this method.

LinearRegressionDP$clone(deep = FALSE) ##### Arguments deep Whether to make a deep clone. ### References Kifer D, Smith A, Thakurta A (2012). “Private Convex Empirical Risk Minimization and High-dimensional Regression.” In Mannor S, Srebro N, Williamson RC (eds.), Proceedings of the 25th Annual Conference on Learning Theory, volume 23 of Proceedings of Machine Learning Research, 25.1–25.40. https://proceedings.mlr.press/v23/kifer12.html. ### Examples # Build example dataset n <- 500 X <- data.frame(X=seq(-1,1,length.out = n)) true.theta <- c(-.3,.5) # First element is bias term p <- length(true.theta) y <- true.theta[1] + as.matrix(X)%*%true.theta[2:p] + stats::rnorm(n=n,sd=.1) # Construct object for linear regression regularizer <- 'l2' # Alternatively, function(coeff) coeff%*%coeff/2 eps <- 1 delta <- 0 # Indicates to use pure eps-DP gamma <- 1 regularizer.gr <- function(coeff) coeff lrdp <- LinearRegressionDP$new('l2', eps, delta, gamma, regularizer.gr)

# Fit with data
# We must assume y is a matrix with values between -p and p (-2 and 2
#   for this example)
upper.bounds <- c(1, 2) # Bounds for X and y
lower.bounds <- c(-1,-2) # Bounds for X and y
lrdp$fit(X, y, upper.bounds, lower.bounds, add.bias=TRUE) lrdp$coeff # Gets private coefficients

# Predict new data points
# Build a test dataset
Xtest <- data.frame(X=c(-.5, -.25, .1, .4))



