forsearch_glm {forsearch} | R Documentation |
Create Statistics of Forward Search in a Generalized Linear Model Database
Description
Prepares summary statistics at each stage of forward search for subsequent plotting. Forward search is conducted in three steps: Step 1 to identify minimal set of observations to estimate unknown parameters, and Step 2 to add one observation at each stage such that observations in the set are best fitting at that stage. A preliminary step (Step 0) contains code for pre-processing of the data.
Usage
forsearch_glm(initial.sample=1000, response.cols, indep.cols, family,
formula=NULL, binomialrhs=NULL, formula.cont.rhs, data, n.obs.per.level = 1,
estimate.phi = TRUE, skip.step1=NULL, unblinded=TRUE, begin.diagnose=100,
verbose=TRUE)
Arguments
initial.sample |
Number of random sets of observations in Step 1 of forward search |
response.cols |
Vector of column numbers (1 or 2) of responses and nonresponses (if binomial) |
indep.cols |
Column number(s) of independent variables |
family |
Error distribution and link |
formula |
Formula relating response to independent variables. Required except for family=binomial |
binomialrhs |
Quoted character.Right-hand side of formula. Required for family=binomial |
formula.cont.rhs |
Quoted character.Right-hand side of formula, omitting factor variables. Required for all families |
data |
Name of database |
n.obs.per.level |
Number of observations per level of (possibly crossed) factor levels |
estimate.phi |
TRUE causes phi to be estimated; FALSE causes phi to be set = 1 |
skip.step1 |
NULL, or vector of observation numbers to include at end of Step 1 |
unblinded |
TRUE allows print of formula of analysis function |
begin.diagnose |
Numeric. Indicates where in code to begin printing diagnostics. 0 prints all; 100 prints none |
verbose |
TRUE causes function identifier to display before and after run |
Details
Step 2 is determined by the results of Step 1, which itself is random. So, it is possible to reproduce the entire run by using the skip.step1 argument. Inner subgroups are produced by presence of categorical variables. Current version assumes independent variables are all continuous.
Value
LIST
Rows in stage |
Observation numbers of rows included at each stage |
Family |
Family and link |
Number of model parameters |
Number of fixed effect parameters |
Fixed parameter estimates |
Matrix of parameter estimates at each stage |
Residual deviance |
Vector of deviances |
Null deviance |
Vector of null deviances |
PhiHat |
Vector of values of phi parameter |
Deviance residuals and augments |
Deviance residuals with indication of whether each is included in fit |
AIC |
Vector of AIC values |
Leverage |
Matrix of leverage of each observation at each stage |
Call |
Call to this function |
Author(s)
William R. Fairweather
References
Atkinson, A and M Riani. Robust Diagnostic Regression Analysis, Springer, New York, 2000.
Examples
# Train deaths (Atkinson and Riani, 2000) with Rolling Stock as a factor
Observation<-1:67
Month<-c(9,8,3,1,10,6,7,1,8,4,3,3,12,11,10,9,9,4,3,12,12,10,7,2,12,2,12,12,12,
11,3,10,4,2,12,12,9,11,1,10,8,6,1,10,6,12,8,4,9,6,12,10,7,2,5,12,5,5,4,3,1,
9,11,9,7,3,2)
Year<-c(97,96,96,95,94,94,91,91,90,89,89,89,88,88,87,86,86,86,86,84,84,84,84,84,
83,83,82,81,81,80,80,79,79,79,78,78,77,76,76,75,75,75,75,74,74,73,73,73,72,
72,71,71,71,71,70,69,69,69,69,69,69,68,67,67,67,67,67)
RollingStock<-c(2,2,3,2,1,1,1,1,2,3,1,1,1,2,1,2,1,3,2,2,1,2,2,3,1,2,1,1,2,3,1,
1,1,1,1,1,1,3,3,2,3,1,2,3,1,1,1,3,3,1,3,3,1,1,1,2,1,1,2,1,1,1,1,1,1,1,1)
RollingStock <- as.factor(RollingStock)
Traffic<-c(0.436,0.424,0.424,0.426,0.419,0.419,0.439,0.439,0.431,0.436,0.436,
0.436,0.443,0.443,0.397,0.414,0.414,0.414,0.414,0.389,0.389,0.389,0.389,
0.389,0.401,0.401,0.372,0.417,0.417,0.43,0.43,0.426,0.426,0.426,0.43,0.43,
0.425,0.426,0.426,0.436,0.436,0.436,0.436,0.452,0.452,0.433,0.433,0.433,
0.431,0.431,0.444,0.444,0.444,0.444,0.452,0.447,0.447,0.447,0.447,0.447,
0.447,0.449,0.459,0.459,0.459,0.459,0.459)
Deaths<-c(7,1,1,1,5,2,4,2,1,1,2,5,35,1,4,1,2,1,1,3,1,3,13,2,1,1,1,4,1,2,1,5,7,
1,1,3,2,1,2,1,2,6,1,1,1,10,5,1,1,6,3,1,2,1,2,1,1,6,2,2,4,2,49,1,7,5,9)
train2022 <- data.frame(Observation, Year, RollingStock, Traffic, Deaths)
forsearch_glm(initial.sample = 100, response.cols = 5,
indep.cols = 2:4, formula=Deaths~Year + RollingStock + Traffic,
formula.cont.rhs="Year + Traffic",
family = poisson("log"), data = train2022,
n.obs.per.level = 1, estimate.phi = TRUE, skip.step1 = NULL,
unblinded = TRUE, begin.diagnose=100)