RAI {rai} | R Documentation |
Main function for Revisiting Alpha-Investing (RAI) regression.
Description
The function rai is a wrapper that creates and manages the
inputs and outputs of the runAuction
function. Using
poly=FALSE is an efficient and statistically valid way to run and terminate
stepwise regression. The function prepareData is provided in order to make
generating predictions on test data easier: it is used by rai to process
the data prior to running, and is necessary to make column names and
information match in order to use the model object returned by rai.
Usage
prepareData(theData, poly = TRUE, startDeg = 1)
is.rai(x)
rai(theData, theResponse, alpha = 0.1, alg = "rai", r = 0.8,
poly = alg != "RH", startDeg = 1, searchType = "breadth",
m = 500, sigma = "step", rmse = NA, df = NA, omega = alpha,
reuse = (alg == "RH"), maxTest = Inf, verbose = FALSE,
save = TRUE, lmFit = .lm.fit)
Arguments
theData |
matrix of covariates. |
poly |
logical. Should the algorithm look for higher-order polynomials? |
startDeg |
This is the starting degree for polynomial regression. It allows the search to start with lower order polynomials such as square roots. This alleviates some problems with high-dimensional polynomials as a 4th degree polynomial where startDeg=1/2 is only a quadratic on the original scale. |
x |
an R object. |
theResponse |
response vector or single column matrix. |
alpha |
level of procedure. |
alg |
algorithm can be one of "rai", "raiPlus", or "RH" (Revisiting Holm). |
r |
threshold parameter, with 0 < r < 1. RAI rejects tests which increase remaining R^2 by a factor r^s, where s is the epoch. Larger values of r yield a closer approximation to stepwise regression. |
searchType |
A character string specifying the prioritization of higher-order polynomials. One of "breadth" (more base features) or "depth" (higher orders). |
m |
number of observations used in subsampling for variance inflation factor estimate of r.squared. Set m=Inf to use full data. |
sigma |
type of error estimate used; one of "ind" or "step". If "ind", you must provide a numeric value for rmse and df. |
rmse |
user provided value for rmse. Must be used with sigma="ind". |
df |
degrees of freedom for user specified rmse. Must be used with sigma="ind". |
omega |
return from rejecting a test in Alpha-Investing (<= alpha). |
reuse |
logical. Should repeated tests of the same covariate be considered a test of the same hypothesis? reusing wealth isn't implemented for RAI or RAIplus as the effect is negligible. |
maxTest |
maximum number of tests. |
verbose |
logical. Should auction output be printed? |
save |
logical. Should the auction results be saved? If TRUE, returns a summary matrix. |
lmFit |
The core function that will be used to estimate linear model fits. The default is .lm.fit, but other alternatives are possible. Note that it does not use formula notation as this is costly. Another recommended option is fastLmPure from RcppEigen or related packages. |
Details
Missing values are treated as follows: all observations with missing values in theResponse are removed; numeric columns in theData have missing values imputed by the mean of the column and an indicator column is added to note missingness; missing values in factor or binary columns are given the value "NA", which creates an additional group for missing values. Note that as rai is run using the output of model.matrix, it is not guaranteed that all categories from a factor are included in the regression. Column names may also be modified to be syntactically valid. The model object can be used to generate predictions on test data. Note that if default conversions were used when running rai, then they must be used again with prepareData for the test data prior to producing predictions.
Value
A list which includes the following components:
y |
response. |
X |
model matrix from final model. |
formula |
final model formula. |
features |
list of interactions included in formula. |
summary |
if save=TRUE, contains information on each test made by the algorithm. |
time |
run time. |
options |
options given to RAI: alg, searchType, poly, r, startDeg, alpha, omega, m. |
subData |
subset of columns from theData that are used in the final model. |
model |
linear model object using selected model |
Summary and predict methods are provided in order to generate further output and graphics.
Examples
data("CO2")
theResponse = CO2$uptake
theData = CO2[ ,-5]
rai_out = rai(theData, theResponse)
summary(rai_out) # summary information including graphs