RAI {rai}R Documentation

Main function for Revisiting Alpha-Investing (RAI) regression.

Description

The function rai is a wrapper that creates and manages the inputs and outputs of the runAuction function. Using poly=FALSE is an efficient and statistically valid way to run and terminate stepwise regression. The function prepareData is provided in order to make generating predictions on test data easier: it is used by rai to process the data prior to running, and is necessary to make column names and information match in order to use the model object returned by rai.

Usage

prepareData(theData, poly = TRUE, startDeg = 1)

is.rai(x)

rai(theData, theResponse, alpha = 0.1, alg = "rai", r = 0.8,
  poly = alg != "RH", startDeg = 1, searchType = "breadth",
  m = 500, sigma = "step", rmse = NA, df = NA, omega = alpha,
  reuse = (alg == "RH"), maxTest = Inf, verbose = FALSE,
  save = TRUE, lmFit = .lm.fit)

Arguments

theData

matrix of covariates.

poly

logical. Should the algorithm look for higher-order polynomials?

startDeg

This is the starting degree for polynomial regression. It allows the search to start with lower order polynomials such as square roots. This alleviates some problems with high-dimensional polynomials as a 4th degree polynomial where startDeg=1/2 is only a quadratic on the original scale.

x

an R object.

theResponse

response vector or single column matrix.

alpha

level of procedure.

alg

algorithm can be one of "rai", "raiPlus", or "RH" (Revisiting Holm).

r

threshold parameter, with 0 < r < 1. RAI rejects tests which increase remaining R^2 by a factor r^s, where s is the epoch. Larger values of r yield a closer approximation to stepwise regression.

searchType

A character string specifying the prioritization of higher-order polynomials. One of "breadth" (more base features) or "depth" (higher orders).

m

number of observations used in subsampling for variance inflation factor estimate of r.squared. Set m=Inf to use full data.

sigma

type of error estimate used; one of "ind" or "step". If "ind", you must provide a numeric value for rmse and df.

rmse

user provided value for rmse. Must be used with sigma="ind".

df

degrees of freedom for user specified rmse. Must be used with sigma="ind".

omega

return from rejecting a test in Alpha-Investing (<= alpha).

reuse

logical. Should repeated tests of the same covariate be considered a test of the same hypothesis? reusing wealth isn't implemented for RAI or RAIplus as the effect is negligible.

maxTest

maximum number of tests.

verbose

logical. Should auction output be printed?

save

logical. Should the auction results be saved? If TRUE, returns a summary matrix.

lmFit

The core function that will be used to estimate linear model fits. The default is .lm.fit, but other alternatives are possible. Note that it does not use formula notation as this is costly. Another recommended option is fastLmPure from RcppEigen or related packages.

Details

Missing values are treated as follows: all observations with missing values in theResponse are removed; numeric columns in theData have missing values imputed by the mean of the column and an indicator column is added to note missingness; missing values in factor or binary columns are given the value "NA", which creates an additional group for missing values. Note that as rai is run using the output of model.matrix, it is not guaranteed that all categories from a factor are included in the regression. Column names may also be modified to be syntactically valid. The model object can be used to generate predictions on test data. Note that if default conversions were used when running rai, then they must be used again with prepareData for the test data prior to producing predictions.

Value

A list which includes the following components:

y

response.

X

model matrix from final model.

formula

final model formula.

features

list of interactions included in formula.

summary

if save=TRUE, contains information on each test made by the algorithm.

time

run time.

options

options given to RAI: alg, searchType, poly, r, startDeg, alpha, omega, m.

subData

subset of columns from theData that are used in the final model.

model

linear model object using selected model

Summary and predict methods are provided in order to generate further output and graphics.

Examples

  data("CO2")
  theResponse = CO2$uptake
  theData = CO2[ ,-5]
  rai_out = rai(theData, theResponse)
  summary(rai_out)  # summary information including graphs

[Package rai version 1.0.0 Index]