FWD {BGData} | R Documentation |
Performs forward regression of y
on the columns of X
.
Predictors are added, one at a time, each time adding the one that produces
the largest reduction in the residual sum of squares (RSS). The function
returns estimates and summaries for the entire forward search. This
function performs a similar search than that of step(,
direction='forward')
, however, FWD()
is optimized for
computational speed for linear models with very large sample size. To
achieve fast computations, the software first computes the sufficient
statistics X'X and X'y. At each step, the function first finds the
predictor that produces the largest reduction in the sum of squares (this
can be derived from X'X, X'y and the current solution of effects), and then
updates the estimates of effects for the resulting model using Gauss Seidel
iterations performed on the linear system (X'X)b=X'y, iterating only over
the elements of b that are active in the model.
FWD(y, X, df = 20, tol = 1e-7, maxIter = 1000, centerImpute = TRUE,
verbose = TRUE)
y |
The response vector (numeric nx1). |
X |
An (nxp) numeric matrix. Columns are the features (aka predictors)
considered in the forward search. The rows of |
df |
Defines the maximum number of predictors to be included in the model.
For complete forward search, set |
tol |
A tolerance parameter to control when to stop the Gauss Seidel algorithm. |
maxIter |
The maximum number of iterations for the Gauss Seidel algorithm (only used when the algorithm is not stopped by the tolerance parameter). |
centerImpute |
Whether to center the columns of |
verbose |
Use |
A list with two entries:
B
: (pxdf+1) includes the estimated effects for each
predictor (rows) at each step of the forward search (df, in columns).
path
: A data frame providing the order in which variables
were added to the model (variable
) and statistics for each step
of the forward search (RSS
, LogLik
, VARE
(the
residual variance), DF
, AIC
, and BIC
).