f_control_mactivate {mactivate} | R Documentation |
Set Fitting Hyperparameters
Description
Allows user a single function to tune the mactivate fitting algorithms, f_fit_gradient_01
, f_fit_hybrid_01
, f_fit_gradient_logistic_01
.
Usage
f_control_mactivate(
param_sensitivity = 10^9,
bool_free_w = FALSE,
w0_seed = 0.1,
max_internal_iter = 500,
w_col_search = "one",
bool_headStart = FALSE,
antifreeze = FALSE,
ss_stop = 10^(-8),
escape_rate = 1.004,
step_size = 1/100,
Wadj = 1/1,
force_tries = 0,
lambda = 0,
tol = 10^(-8))
Arguments
param_sensitivity |
Large positive scalar numeric. |
bool_free_w |
Scalar logical. Allow values of |
w0_seed |
Scalar numeric. Usually in [0,1]. Initial value(s) for multiplicative activation layer, |
max_internal_iter |
Scalar non-negative integer. Hybrid only. How many activation descent passes to make before refitting primary effects. |
w_col_search |
Scalar character. When |
bool_headStart |
Scalar logical. Gradient only. When |
antifreeze |
Scalar logical. Hybrid only. New w/v0.6.5. When |
ss_stop |
Small positive scalar numeric. Convergence tolerance. |
escape_rate |
Scalar numeric no less than one and likely no greater than, say, 1.01. Affinity for exiting a column search over |
step_size |
Positive scalar numeric. Initial gradient step size (in both gradient and hybrid fitting algorithms) for all parameters. |
Wadj |
Positive scalar numeric. Control gradient step size (in both gradient and hybrid fitting algorithms) of |
force_tries |
Scalar non-negative integer. Force a minimum number of fitting recursions. |
lambda |
Scalar numeric. Ridge regularizer. The actual diagonal loading imposed upon the precision matrix is equal to |
tol |
Small positive scalar numeric. Hybrid only. Similar to arg |
Details
Fitting a mactivate model to data can/will be dramatically affected by these tuning hyperparameters. On one extreme, one set of hyperparameters may result in the fitting algorithm fruitlessly exiting almost immediately. Another set of hyperparameters may send the fitting algorithm to run and run for hours. While an ideal hyperparameterization will expeditiously fit the data.
Value
Named list to be passed to mact_control
arg in fitting functions.
See Also
f_fit_gradient_01
, f_fit_hybrid_01
, f_fit_gradient_logistic_01
.
Examples
library(mactivate)
set.seed(777)
d <- 20
N <- 50000
X <- matrix(rnorm(N*d, 0, 1), N, d)
colnames(X) <- paste0("x", I(1:d))
############# primary effect slopes
b <- rep_len( c(-1, 1), d )
ystar <-
X %*% b +
1 * (X[ , 1]) * (X[ , 2]) * (X[ , 3]) -
1 * (X[ , 2]) * (X[ , 3]) * (X[ , 4]) * (X[ , 5])
Xall <- X
errs <- rnorm(N, 0, 1)
errs <- 3 * (errs - mean(errs)) / sd(errs)
sd(errs)
y <- ystar + errs ### response
yall <- y
Nall <- N
############# hybrid example
### this control setting will exit too quickly
### compare this with example below
xcmact <-
f_control_mactivate(
param_sensitivity = 10^5,
w0_seed = 0.1,
max_internal_iter = 500,
w_col_search = "one",
ss_stop = 10^(-5),
escape_rate = 1.01,
Wadj = 1/1,
lambda = 1/1000,
tol = 10^(-5)
)
m_tot <- 4
Uall <- Xall
xxnow <- Sys.time()
xxls_out <-
f_fit_hybrid_01(
X = Xall,
y = yall,
m_tot = m_tot,
U = Uall,
m_start = 1,
mact_control = xcmact,
verbosity = 1
)
cat( difftime(Sys.time(), xxnow, units="mins"), "\n" )
yhatG <- predict(object=xxls_out, X0=Xall, U0=Uall, mcols=m_tot )
sqrt( mean( (yall - yhatG)^2 ) )
####################### this control setting should fit
####################### (will take a few minutes)
xcmact <-
f_control_mactivate(
param_sensitivity = 10^10, ### make more sensitive
w0_seed = 0.1,
max_internal_iter = 500,
w_col_search = "one",
ss_stop = 10^(-14), ### make stopping insensitive
escape_rate = 1.001, #### discourage quitting descent
Wadj = 1/1,
lambda = 1/10000,
tol = 10^(-14) ### make tolerance very small
)
m_tot <- 4
Uall <- Xall
xxnow <- Sys.time()
xxls_out <-
f_fit_hybrid_01(
X = Xall,
y = yall,
m_tot = m_tot,
U = Uall,
m_start = 1,
mact_control = xcmact,
verbosity = 1
)
cat( difftime(Sys.time(), xxnow, units="mins"), "\n" )
yhatG <- predict(object=xxls_out, X0=Xall, U0=Uall, mcols=m_tot )
sqrt( mean( (yall - yhatG)^2 ) )
xxls_out
Xstar <- f_mactivate(U=Uall, W=xxls_out[[ m_tot+1 ]][[ "What" ]])
colnames(Xstar) <- paste0("xstar_", seq(1, m_tot))
Xall <- cbind(Xall, Xstar)
xlm <- lm(yall~Xall)
summary(xlm)