sketch_leverage {sketching}R Documentation

Sketch using leverage score type sampling

Description

Provides a subsample of data using sketches

Usage

sketch_leverage(data, m, method = "leverage")

Arguments

data

(n times d)-dimensional matrix of data. The first column needs to be a vector of the dependent variable (Y)

m

subsample size that is less than n

method

method for sketching: "leverage" leverage score sampling using X (default); "root_leverage" square-root leverage score sampling using X.

Value

An S3 object has the following elements.

subsample

(m times d)-dimensional matrix of data

prob

m-dimensional vector of probabilities

References

Ma, P., Zhang, X., Xing, X., Ma, J. and Mahoney, M.. (2020). Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1026-1035.

Examples

## Least squares: sketch and solve
# setup
n <- 1e+6 # full sample size
d <- 5    # dimension of covariates
m <- 1e+3 # sketch size
# generate psuedo-data
X <- matrix(stats::rnorm(n*d), nrow = n, ncol = d)
beta <- matrix(rep(1,d), nrow = d, ncol = 1)
eps <- matrix(stats::rnorm(n), nrow = n, ncol = 1)
Y <- X %*% beta + eps
intercept <- matrix(rep(1,n), nrow = n, ncol = 1)
# full sample including the intercept term
fullsample <- cbind(Y,intercept,X)
# generate a sketch using leverage score sampling
s_lev <-  sketch_leverage(fullsample, m, "leverage")
# solve without the intercept with weighting
ls_lev <- lm(s_lev$subsample[,1] ~ s_lev$subsample[,2] - 1, weights = s_lev$prob)

[Package sketching version 0.1.2 Index]