Stage.1 {NormData} | R Documentation |
Stage 1 of the regression-based normative analysis
Description
The function Stage.1
fits a regression model with the specified mean and residual variance components, and conducts several model checks (homoscedasticity, normality, absence of outliers, and multicollinearity) that are useful in a setting where regression-based normative data have to be established.
Usage
Stage.1(Dataset, Model, Order.Poly.Var=3,
Alpha=0.05, Alpha.Homosc=0.05, Alpha.Norm = .05,
Assume.Homoscedasticity=NULL,
Test.Assumptions=TRUE, Outlier.Cut.Off=4,
Show.VIF=TRUE, GVIF.Threshold=10, Sandwich.Type="HC0",
Alpha.CI.Group.Spec.SD.Resid=0.01)
Arguments
Dataset |
A |
Model |
The regression model to be fitted (mean structure). A formula should be provided using the syntaxis of the |
Order.Poly.Var |
If the homoscedasticity assumption is violated and the mean structure of the fitted model contains at least one quantitative variable, a polynomial variance prediction function is fitted. The argument |
Alpha |
The significance level to be used when conducting inference for the mean structure of the model. Default |
Alpha.Homosc |
The significance level to be used to evaluate the homoscedasticity assumption based on the Levene test (when all independent variables in the model are qualitative) or the Breusch-Pagan test (when at least one of the independent variables is quantitative). Default |
Alpha.Norm |
The significance level to be used to test the normality assumption for the standardized errors using the Shapiro-Wilk test. The normality assumption is evaluated based on the standardized residuals in the normative dataset, which are computed as explained in the |
Assume.Homoscedasticity |
Logical. The By default, the standardized residuals |
Test.Assumptions |
Logical. Should the model assumptions be evaluated for the specified model? Default |
Outlier.Cut.Off |
Outliers are evaluated based on the standardized residuals, which are computed as explained in the |
Show.VIF |
Logical. Should the generalized VIF (Fox and Monette, 1992) be shown when the function |
GVIF.Threshold |
The threshold value to be used to detect multicollinearity based on the generalized VIF. Default |
Sandwich.Type |
When the homoscedasticity assumption is violated, so-called sandwich estimators (or heteroscedasticity-consistent estimators) for the standard errors of the regression parameters are used. For example, the sandwich estimator for the standard error of |
Alpha.CI.Group.Spec.SD.Resid |
The |
Details
For details, see Van der Elst (2023).
Value
An object of class Stage.1
with components,
HomoNorm |
The fitted regression model assuming homoscedasticity and normality. |
NoHomoNorm |
The fitted regression model assuming no homoscedasticity and normality. |
HomoNoNorm |
The fitted regression model assuming homoscedasticity and no normality. |
NoHomoNoNorm |
The fitted regression model assuming no homoscedasticity and no normality. |
Predicted |
The predicted test scores based on the fitted model. |
Sandwich.Type |
The requested sandwich estimator. |
Order.Poly.Var |
The order of the polynomial variance prediction function. |
Author(s)
Wim Van der Elst
References
Fox, J. and Monette, G. (1992). Generalized collinearity diagnostics. JASA, 87, 178-183.
Long, J. S. and Ervin, L. H. (2000). Using Heteroscedasticity Consistent Standard Errors in the Linear Regression Model. The American Statistician, 54, 217-224.
Van der Elst, W. (2024). Regression-based normative data for psychological assessment: A hands-on approach using R. Springer Nature.
See Also
plot Stage.1
, Stage.2.AutoScore
, Stage.2.NormScore
, Stage.2.NormTable
Examples
# Replicate the Stage 1 results that were obtained in
# Case study 1 of Chapter 4 in Van der Elst (2023)
# ---------------------------------------------------
library(NormData) # load the NormData package
data(GCSE) # load the GCSE dataset
# Conduct the Stage 1 analysis
Model.1.GCSE <- Stage.1(Dataset=GCSE,
Model=Science.Exam~Gender)
summary(Model.1.GCSE)
plot(Model.1.GCSE)
# Replicate the Stage 1 results that were obtained in
# Case study 1 of Chapter 7 in Van der Elst (2023)
# ---------------------------------------------------
library(NormData) # load the NormData package
data(Substitution) # load the Substitution dataset
# Add the variable Age.C (= Age centered) and its
# quadratic and cubic terms to the Substitution dataset
Substitution$Age.C <- Substitution$Age - 50
Substitution$Age.C2 <- (Substitution$Age - 50)**2
Substitution$Age.C3 <- (Substitution$Age - 50)**3
# Fit the full Stage 1 model
Substitution.Model.1 <- Stage.1(Dataset=Substitution,
Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE+Age.C:LE+
Gender:LE+Age.C:Gender, Alpha=0.005)
summary(Substitution.Model.1)
# Fit the model in which the non-significant Age.C:Gender
# interaction term is removed
Substitution.Model.2 <- Stage.1(Dataset=Substitution,
Alpha=0.005,
Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE+
Age.C:LE+Gender:LE)
summary(Substitution.Model.2)
# Evaluate the significance of the Gender:LE interaction term
# GLT is used because the interaction involves multiple regression
# parameters
GLT.1 <- GLT(Dataset=Substitution, Alpha=0.005,
Unrestricted.Model=LDST~Age.C+Age.C2+Age.C3+
Gender+LE+Age.C:LE+Gender:LE,
Restricted.Model=LDST~Age.C+Age.C2+Age.C3+
Gender+LE+Age.C:LE)
summary(GLT.1)
# Fit the model in which the non-significant Gender:LE
# interaction term is removed
Substitution.Model.3 <- Stage.1(Dataset=Substitution,
Alpha=0.005,
Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE+Age.C:LE)
summary(Substitution.Model.3)
# Evaluate the significance of the Age:LE interaction
# using the General Linear Test framework
GLT.2 <- GLT(Dataset=Substitution,
Unrestricted.Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE+Age.C:LE,
Restricted.Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE, Alpha=0.005)
summary(GLT.2)
# Fit the model in which the non-significant Age_c:LE
# interaction term is removed
Substitution.Model.4 <- Stage.1(Dataset=Substitution,
Alpha=0.005, Model=LDST~Age.C+Age.C2+Age.C3+Gender+LE)
summary(Substitution.Model.4)
# Fit the model in which the non-significant Age.C3 term is removed
Substitution.Model.5 <- Stage.1(Dataset=Substitution,
Alpha=0.005, Model=LDST~Age.C+Age.C2+Gender+LE)
summary(Substitution.Model.5)
# Fit the model in which the non-significant Age.C2 term is removed
Substitution.Model.6 <- Stage.1(Dataset=Substitution,
Alpha=0.005, Model=LDST~Age.C+Gender+LE)
summary(Substitution.Model.6)
# Fit the model in which the non-significant main effect of Gender
# is removed
Substitution.Model.7 <- Stage.1(Dataset=Substitution,
Alpha=0.005, Model=LDST~Age.C+LE)
summary(Substitution.Model.7)
plot(Substitution.Model.7, Normality = FALSE, Outliers = FALSE)
# Check the significance of LE using the GLT framework
GLT.3 <- GLT(Dataset=Substitution, Alpha=0.005,
Unrestricted.Model=LDST~Age.C+LE,
Restricted.Model=LDST~Age.C)
summary(GLT.3)
# Residual variance function. Substitution.Model.7 uses
# a cubic polynomial variance prediction function.
# Remove cubic Pred.Y term from Substitution.Model.7, so
# fit quadratic variance prediction function
Substitution.Model.8 <- Stage.1(Dataset=Substitution,
Alpha=0.005, Model=LDST~Age.C+LE,
Order.Poly.Var=2) # Order.Poly.Var=2 specifies a quadratic polynomial
# for the variiance prediction function
summary(Substitution.Model.8)
plot(Substitution.Model.8, Normality = FALSE, Outliers = FALSE)
# Remove quadratic Pred.Y term, so fit linear variance
# prediction function
Substitution.Model.9 <- Stage.1(Dataset=Substitution,
Alpha=0.005, Model=LDST~Age.C+LE,
Order.Poly.Var=1) # Order.Poly.Var=1 specifies a linear polynomial
# for the variiance prediction function
# Final Stage 1 model
summary(Substitution.Model.9)
plot(Substitution.Model.9)