stochastic.logistic.regression {stochQN} | R Documentation |
Stochastic Logistic Regression
Description
Stochastic Logistic Regression
Usage
stochastic.logistic.regression(formula = NULL, pos_class = NULL,
dim = NULL, intercept = TRUE, x0 = NULL, optimizer = "adaQN",
optimizer_args = list(initial_step = 0.1, verbose = FALSE),
lambda = 0.001, random_seed = 1, val_data = NULL)
Arguments
formula |
Formula for the model, if it is fit to data.frames instead of matrices/vectors. |
pos_class |
If fit to data in the form of data.frames, a string indicating which of the classes is the positive one. If fit to data in the form of matrices/vector, pass 'NULL'. |
dim |
Dimensionality of the model (number of features). Ignored when passing 'formula' or when passing 'x0'. If the intercept is added from the option 'intercept' here, it should not be counted towards 'dim'. |
intercept |
Whether to add an intercept to the covariates. Only ussed when fitting to matrices/vectors. Ignored when passing formula (for formulas without intercept, put '-1' in the RHS to get rid of the intercept). |
x0 |
Initial values of the variables. If passed, will ignore 'dim' and 'random_seed'. If not passed, will generate random starting values ~ Norm(0, 0.1). |
optimizer |
The optimizer to use - one of 'adaQN' (recommended), 'SQN', 'oLBFGS'. |
optimizer_args |
Arguments to pass to the optimizer (same ones as the functions of the same name). Must be a list. See the documentation of each optimizer for the parameters they take. |
lambda |
Regularization parameter. Be aware that the functions assume the log-likelihood (a.k.a. loss) is divided by the number of observations, so this number should be small. |
random_seed |
Random seed to use for the initialization of the variables. Ignored when passing 'x0'. |
val_data |
Validation data (only used for 'adaQN'). If passed, must be a list with entries 'X', 'y' (if passing data.frames for fitting), and optionally 'w' (sample weights). |
Details
Binary logistic regression, fit in batches using this package's own optimizers.
Value
An object of class 'stoch_logistic', which can be fit to batches of data through functon 'partial_fit_logistic'.
See Also
partial_fit_logistic, coef.stoch_logistic , predict.stoch_logistic , adaQN , SQN, oLBFGS
Examples
library(stochQN)
### Load Iris dataset
data("iris")
### Example with X + y interface
X <- as.matrix(iris[, c("Sepal.Length", "Sepal.Width",
"Petal.Length", "Petal.Width")])
y <- as.numeric(iris$Species == "setosa")
### Initialize model with default parameters
model <- stochastic.logistic.regression(dim = 4)
### Fit to 10 randomly-subsampled batches
batch_size <- as.integer(nrow(X) / 3)
for (i in 1:10) {
set.seed(i)
batch <- sample(nrow(X),
size = batch_size, replace=TRUE)
partial_fit_logistic(model, X, y)
}
### Check classification accuracy
cat(sprintf(
"Accuracy after 10 iterations: %.2f%%\n",
100 * mean(
predict(model, X, type = "class") == y)
))
### Example with formula interface
iris_df <- iris
levels(iris_df$Species) <- c("setosa", "other", "other")
### Initialize model with default parameters
model <- stochastic.logistic.regression(Species ~ .,
pos_class="setosa")
### Fit to 10 randomly-subsampled batches
batch_size <- as.integer(nrow(iris_df) / 3)
for (i in 1:10) {
set.seed(i)
batch <- sample(nrow(iris_df),
size=batch_size, replace=TRUE)
partial_fit_logistic(model, iris_df)
}
cat(sprintf(
"Accuracy after 10 iterations: %.2f%%\n",
100 * mean(
predict(
model, iris_df, type = "class") == iris_df$Species
)
))