Logit {lessR} | R Documentation |
Logit Regression Analysis
Description
Abbreviation: lr
A wrapper for the standard R glm
function with family="binomial"
, automatically provides a logit regression analysis with graphics from a single, simple function call with many default settings, each of which can be re-specified. By default the data exists as a data frame with the default name of d
, such as data read by the lessR
Read
function. Specify the model in the function call according to an R formula
, that is, the response variable followed by a tilde, followed by the list of predictor variables, each pair separated by a plus sign.
The response variable for analysis has values only of 0 and 1, with 1 designating the reference group. If the response variable is a factor with two levels, they factor levels are automatically converted to a numeric variable with values of 0 and 1.
Default output includes the inferential analysis of the estimated coefficients and model, sorted residuals and Cook's Distance, and sorted fitted values for existing data or new data. For a single predictor variable model, the scatterplot of the data with plotted logit function is provided.
Can also be called from the more general model
function.
Usage
Logit(my_formula, data=d, filter=NULL, ref_group=NULL,
digits_d=4, text_width=120,
brief=getOption("brief"),
res_rows=NULL, res_sort=c("cooks","rstudent","dffits","off"),
pred=TRUE, pred_all=FALSE, prob_cut=0.5, cooks_cut=1,
X1_new=NULL, X2_new=NULL, X3_new=NULL, X4_new=NULL,
X5_new=NULL, X6_new=NULL,
pdf_file=NULL, width=5, height=5, ...)
lr(...)
Arguments
my_formula |
Standard R |
data |
The default name of the data frame that contains the data
for analysis is |
filter |
A logical expression that specifies a subset of rows of the data frame to analyze. |
ref_group |
Value of the response variable that is the reference group,
otherwise set by default as the value that yields a |
digits_d |
For the Basic Analysis, it provides the number of decimal digits. For the rest of the output, it is a suggestion only. |
text_width |
Width of the text output at the console. |
brief |
If set to |
res_rows |
Default is 25, which lists the first 25 rows of data sorted
by the specified sort criterion. To turn this option off, specify a
value of 0. To see the output for all observations, specify a value of
|
res_sort |
Default is |
pred |
Default is |
pred_all |
Default is |
prob_cut |
Probability threshold for classifying an observation into the reference group (1) or not (0), applied to the forecasts with prediction intervals as well as to the confusion matrix. Can be a vector, in which case if multiple predictors, the forecasts are for a threshold of 0.5, then the confusion matrices according to the specified values. If a single specified value, then both the forecasts and the one confusion matrix are computed with that value. |
cooks_cut |
Cutoff value of Cook's Distance at which observations with a larger value are flagged in red and labeled in the resulting scatterplot of Residuals and Fitted Values. Default value is 1.0. |
X1_new |
Values of the first listed predictor variable for which forecasted values and corresponding prediction intervals are calculated. |
X2_new |
Values of the second listed predictor variable for which forecasted values and corresponding prediction intervals are calculated. |
X3_new |
Values of the third listed predictor variable for which forecasted values and corresponding prediction intervals are calculated. |
X4_new |
Values of the fourth listed predictor variable for which forecasted values and corresponding prediction intervals are calculated. |
X5_new |
Values of the fifth listed predictor variable for which forecasted values and corresponding prediction intervals are calculated. |
X6_new |
Values of the sixth listed predictor variable for which forecasted values and corresponding prediction intervals are calculated. |
pdf_file |
Name of the pdf file to which graphics are redirected. |
width |
Width of the pdf file in inches. |
height |
Height of the pdf file in inches. |
... |
Other parameter values for R function |
Details
OVERVIEW
Logit
combines the following function calls into one, as well as provide ancillary analyses such as as graphics, organizing output into tables and sorting to assist interpretation of the output. The basic analysis successively invokes several standard R functions beginning with the standard R function for estimation of the logit model, glm
with family="binomial"
. The output of the analysis is stored in the object lm.out
, available for further analysis in the R environment upon completion of the Logit
function. By default automatically provides the analyses from the standard R functions, summary
, confint
and anova
, with some of the standard output modified and enhanced. The residual analysis invokes fitted
, resid
, rstudent
, and cooks.distance
functions. The option for prediction intervals calls the standard generic R function predict
.
The default analysis provides the model's parameter estimates and corresponding hypothesis tests and confidence intervals, goodness of fit indices, the ANOVA table, analysis of residuals and influence as well as the fitted value and standard error for each observation in the model.
DATA
The name d
is by default provided by the Read
function included in this package for reading and displaying information about the data in preparation for analysis. If all the variables in the model are not in the same data frame, the analysis will not be complete. The data frame does not need to be attached, just specified by name with the data
option if the name is not the default d
.
The filter
parameter subsets rows (cases) of the input data frame according to a logical expression. Use the standard R operators for logical statements as described in Logic
such as &
for and, |
for or and !
for not, and use the standard R relational operators as described in Comparison
such as ==
for logical equality !=
for not equals, and >
for greater than. See the Examples.
GRAPHICS
For models with a single predictor variable, a scatter plot of the data is produced, which also includes the fitted values_ As with the density histogram plot of the residuals and the scatterplot of the fitted values and residuals, the scatterplot includes a colored background with grid lines. If more than a single predictor variable, then a scatter plot matrix is produced.
FORECASTS
Fitted and forecasted values are listed for all rows of data if the number of rows is less than 25 or if pred_all=TRUE
. If only some of the rows are listed, sorted by the fitted value, the first and last four rows of data are listed. Also the 4 rows immediately around the fitted value of 0.5 are listed.
RESIDUAL ANALYSIS
By default the residual analysis lists the data and fitted value for each observation as well as the residual, Studentized residual, Cook's distance and dffits, with the first 20 observations listed and sorted by Cook's distance. The residual displayed is the actual difference between fitted and observed, that is, with the setting in the residuals
of type="response"
. The res_sort
option provides for sorting by the Studentized residuals or not sorting at all. The res_rows
option provides for listing these rows of data and computed statistics statistics for any specified number of observations (rows). To turn off the analysis of residuals, specify res_rows=0
.
INVOKED R OPTIONS
The options
function turns off the stars for different significance levels (show.signif.stars=FALSE), turns off scientific notation for the output (scipen=30), and sets the width of the text output at the console to 120 characters. The later option can be re-specified with the text_width
option. After Logit
is finished with a normal termination, the options are re-set to their values before the Logit
function began executing.
COLORS
The default color theme is "colors"
, but a gray scale is available with "gray"
, and other themes are available as explained in style
, such as "red"
and "green"
. Use the option style(sub_theme="black")
for a black background and partial transparency of plotted colors.
Value
Following the standard R
function glm
, invisibly returns an object of class
inheriting from "glm" which inherits from the class
"lm". Particularly useful for comparing nested models. Assign the output of Logit
for a model to an object. Then for a nested model. Then use the anova
function to compare the models as shown in the examples below.
Author(s)
David W. Gerbing (Portland State University; gerbing@pdx.edu)
References
Gerbing, D. W. (2023). R Data Analysis without Programming: Explanation and Interpretation, 2nd edition, Chapter 13, NY: Routledge.
See Also
formula
, glm
, summary.glm
, anova
, confint
, fitted
, resid
, rstudent
, cooks.distance
Examples
# Gender has values of "M" and "F"
d <- Read("Employee", quiet=TRUE)
# logit regression, rely upon default parameter value: data=d
Logit(Gender ~ Years)
# short name
lr(Gender ~ Years)
# Modify the default settings as specified
Logit(Gender ~ Years, res_row=8, res_sort="rstudent", digits_d=8, pred=FALSE)
Logit(Gender ~ Years)
# Multiple logistic regression model with specified probability thresholds
# for classification into the reference group
# just for employees who have worked more than 5 years at the firm
Logit(Gender ~ Years + Salary, prob_cut=c(.4, .7), filter=(Years > 3))
# Custom contrasts for categorical predictor
d$JobSat <- factor(d$JobSat, levels=c("low", "med", "high"))
contrasts(d$JobSat) <- contr.sum(n=3)
Logit(Gender ~ JobSat)
# Compare nested models
# easier and better treatment of missing data with lessR function: Nest
full_model <- Logit(Gender ~ Years + Salary)
reduced_model <- Logit(Gender ~ Years)
anova(reduced_model, full_model)
# Save the three plots as pdf files 4 inches square, gray scale
#Logit(Gender ~ Years, pdf_file="MyModel.pdf",
# width=4, height=4, colors="gray")
# Specify new values of the predictor variables to calculate
# forecasted values
d <- Read("Cars93")
Logit(Source ~ HP + MidPrice, X1_new=seq(100,250,50), X2_new=c(10,60,10))