bestModel {cNORM} | R Documentation |

The function computes a series of regressions with an increasing number of predictors and
takes the best fitting model per step. The aim is to find a model with as few predictors
as possible, which at the same time manages to explain as much variance as possible from
the original data. In psychometric test construction, this approach can be used to smooth
the data and eliminate noise from norm sample stratification, while preserving the overall
diagnostic information. Values around R2 = .99 usually show excellent results. The selection
of the model can either be based on the number of terms in the regression functions or the
share of explained variance of the model (R2). If both are specified, first the method tries
to select the model based on the number of terms and in case, this does not work, use R2
instead. Pushing R2 by setting the number of terms, the R2 cut off and k to high values
might lead to on over-fit, so be careful! These parameters depend on the distribution of
the norm data. As a rule of thumb, terms = 5 or R2 = .99 and k = 4 is a good starting point
for the analyses.
`plotSubset(model)`

can be used to weigh up R2 and information criteria (Cp, an AIC like measure)
and fitted versus manifest scores can be plotted with 'plotRaw', 'plotNorm' and 'plotPercentiles'.
Use `checkConsistency(model)`

to check the model for violations. `cnorm.cv`

can help
in identifying the ideal number of predictors.

bestModel( data, raw = NULL, R2 = NULL, k = NULL, predictors = NULL, terms = 0, weights = NULL, force.in = NULL, plot = TRUE )

`data` |
The preprocessed dataset, which should include the variables 'raw' and the powers and interactions of the norm score (L = Location; usually T scores) and an explanatory variably (usually age = A) |

`raw` |
the name of the raw score variable (default raw) |

`R2` |
Adjusted R square as a stopping criterion for the model building (default R2 = 0.99) |

`k` |
The power constant. Higher values result in more detailed approximations but have the danger of over-fit (default = 4, max = 6) |

`predictors` |
List of the names of predictor to use for the model selection. The parameter overrides the 'k' parameter and it can be used to preselect the variables entering the regression, or even to add variables like sex, that are not part of the original model building. Please note, that adding other variables than those based on L and A, plotting, prediction and normTable function will most likely not work, but at least the regression formula can be obtained that way. The parameter as well accepts a formula object, f. e. when applying a pre computed model to a new dataset. In this case, k is as well overridden. In order to include all predictors in the regression, you might want to adjust the terms parameter to the number of predictors as well. |

`terms` |
Selection criterion for model building. The best fitting model with this number of terms is used |

`weights` |
Optional vector with weights for the single cases. By default, if data has been weighting in ranking, these weights are reused here as well. Please set to FALSE to deactivate this behavior. All weights have to be positive. This is currently an EXPERIMENTAL feature and will probably be deprecated in a future release. |

`force.in` |
List of variable names forced into the regression function. This option can be used to force the regression to include covariates like sex or other background variables. This can be used to model separate norm scales for different groups in order the sample. Variables specified here, that are not part of the initial regression function resp. list of predictors, are ignored without further notice and thus do not show up in the final result. Additionally, all other functions like norm table generation and plotting are so far not yet prepared to handle covariates. |

`plot` |
If set to TRUE (default), the percentile plot of the model is shown |

The model meeting the R2 criteria with coefficients and variable selection
in model$coefficients. Use `plotSubset(model)`

and
`plotPercentiles(data, model)`

to inspect model

plotSubset, plotPercentiles, plotPercentileSeries, checkConsistency

Other model:
`checkConsistency()`

,
`cnorm.cv()`

,
`derive()`

,
`modelSummary()`

,
`print.cnorm()`

,
`printSubset()`

,
`rangeCheck()`

,
`regressionFunction()`

,
`summary.cnorm()`

## Not run: # Standard example with sample data normData <- prepareData(elfe) model <- bestModel(normData) plotSubset(model) plotPercentiles(normData, model) # It is possible to specify the variables explicitly - useful to smuggle # in variables like sex preselectedModel <- bestModel(normData, predictors = c("L1", "L3", "L1A3", "A2", "A3")) print(regressionFunction(preselectedModel)) # Example for modeling based on continuous age variable and raw variable, # based on the CDC data. We use the default k=4 parameter; raw variable has # to be set to "bmi". bmi.data <- prepareData(CDC, raw = "bmi", group = "group", age = "age") bmi.model <- bestModel(bmi.data, raw = "bmi") printSubset(bmi.model) # Use the formula of the pre calculated bmi data to compute models for girls and # boys seperately bmi.model.boys <- bestModel(bmi.data[bmi.data$sex == 1, ], predictors = bmi.model$terms) bmi.model.girls <- bestModel(bmi.data[bmi.data$sex == 2, ], predictors = bmi.model$terms) # Custom list of predictors (based on k = 3) and forcing in the sex variable # While calculating the regression model works well, all other functions like # plotting and norm table generation are not yet prepared to use covariates bmi.sex <- bestModel(bmi.data, raw = "bmi", predictors = c( "L1", "L2", "L3", "A1", "A2", "A3", "L1A1", "L1A2", "L1A3", "L2A1", "L2A2", "L2A3", "L3A1", "L3A2", "L3A3", "sex" ), force.in = c("sex")) ## End(Not run)

[Package *cNORM* version 2.0.3 Index]