R: Automated Forward Stepwise GLM

fwd_stepwise_glm {AutoStepwiseGLM}

R Documentation

Automated Forward Stepwise GLM

Description

Takes in a dataframe and the dependent variable (in quotes) as arguments, splits the data into testing and training, and uses automated forward stepwise selection to build a series of multiple regression models on the training data. Each model is then evaluated on the test data and model evaluation metrics are computed for each model. These metrics are provided as plots. Additionally, the model metrics are ranked and average rank is taken. The model with the lowest average ranking among the metrics is displayed (along with its formula). By default, metrics are all given the same relative importance (i.e., weights) when calculating average model metric rank, but if the user desires to give more weight to one or more metrics than the others they can specify these weights as arguments (default for weights is 1). As of v 0.2.0, only the family = gauissian(link = 'identity') argument is provided within the glm function.

Usage

fwd_stepwise_glm(data, dv, aic_wt = 1, r_wt = 1, mae_wt = 1,
  r_squ_wt = 1, train_prop = 0.7, random_seed = 7)

Arguments

`data`	A dataframe with one column as the dependent variable and the others as independent variables
`dv`	The column name of the (continuous) dependent variable (must be in quotes, i.e., 'Dependent_Variable')
`aic_wt`	Weight given to the rank value of the AIC of the model fitted on the training data (used when calculating mean model performance, default = 1)
`r_wt`	Weight given to the rank value of the Pearson Correlation between the predicted and actual values on the test data (used when calculating mean model performance, default = 1)
`mae_wt`	Weight given to the rank value of Mean Absolute Error on the test data (used when calculating mean model performance, default = 1)
`r_squ_wt`	Weight given to the rank value of R-Squared on the test data (used when calculating mean model performance, default = 1)
`train_prop`	Proportion of the data used for the training data set
`random_seed`	Random seed to use when splitting into training and testing data

Value

This function returns a plot for each metric by model and the best overall model with the formula used when fitting that model

Examples

dt <- mtcars
stepwise_model <- fwd_stepwise_glm(data = dt,
                                   dv = 'mpg',
                                   aic_wt = 1,
                                   r_wt = 0.8,
                                   mae_wt = 1,
                                   r_squ_wt = 0.8,
                                   train_prop = 0.6,
                                   random_seed = 5)
stepwise_model

[Package AutoStepwiseGLM version 0.2.0 Index]