GenSyntheticLogistic {L0Learn} | R Documentation |
Generate Logistic Synthetic Data
Description
Generates a synthetic dataset as follows: 1) Generate a data matrix, X, drawn from a multivariate Gaussian distribution with mean = 0, sigma = Sigma 2) Generate a vector B with k entries set to 1 and the rest are zeros. 3) Every coordinate yi of the outcome vector y exists in -1, 1^n is sampled independently from a Bernoulli distribution with success probability: P(yi = 1|xi) = 1/(1 + exp(-s<xi, B>)) Source https://arxiv.org/pdf/2001.06471.pdf Section 5.1 Data Generation
Usage
GenSyntheticLogistic(
n,
p,
k,
seed,
rho = 0,
s = 1,
sigma = NULL,
shuffle_B = FALSE
)
Arguments
n |
Number of samples |
p |
Number of features |
k |
Number of non-zeros in true vector of coefficients |
seed |
The seed used for randomly generating the data |
rho |
The threshold for setting values to 0. if |X(i, j)| > rho => X(i, j) <- 0 |
s |
Signal-to-noise parameter. As s -> +Inf, the data generated becomes linearly separable. |
sigma |
Correlation matrix, defaults to I. |
shuffle_B |
A boolean flag for whether or not to randomly shuffle the Beta vector, B. If FALSE, the first k entries in B are set to 1. |
Value
A list containing: the data matrix X, the response vector y, the coefficients B.